Lost in translation: Why Google Translate often gets Yorùbá — and other languages — wrong

13 March 2020

Internet

The English language has dominated online discourse as the “universal” language of communication since the inception of the internet. As of February 2020, over half of the websites on the internet are in English, according to WebTech3.

But as more people get online who speak different languages, it has sparked a linguistic digital revolution — immediate access to English translations of multiple languages with the click of a button.

Many tech companies have recently put effort into documenting non-English words on the internet, paving the way for the digitization of multiple languages. Google, Yoruba Names, Masakhane MT and ALC are examples of companies and start-ups that have been trying to marry technology with non-English languages.

In late February 2020, Google announced that it would add five new languages to its Google Translate services, including Kinyarwanda, Uighur, Tatar, Turkmen and Odia, after a four-year hiatus on adding new languages.

But have you ever clicked on the translation option and realized that the English translation is, at best, just OK? And at worst, not accurate at all?

There are many controversies and difficulties when it comes to doing this kind of language translation and access work.

Twitter offers Yorùbá language translation into English via Google Translate as much as possible, and usually, the outcome isn’t totally bad — perhaps a few words are correct.

The reason for these challenges is that tech companies usually collect their linguistic data for English translation sourced from the internet. This data may work for some languages, but languages like Yorùbá and Ìgbò, two main languages from Nigeria, are challenging, due to the inadequate or inaccurate accent marks to indicate tones on these words. Read the full article on Global Voices Online here