People are still unrivaled when it comes to evaluating the quality of a piece of text – so says Google in their article on Evaluating Natural Language Generation with BLEURT. We here at Tilti Multilingual tend to agree that quality translations can be achieved only by employing professional translators and proofreaders. Nevertheless, machine (automated) translations are rapidly improving, and here is their state as of 2020.

Supported languages

As of now, the most prominent automated translation software support 109 languages:

  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Esperanto
  • Estonian
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Kinyarwanda
  • Korean
  • Kurdish
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Nyanja (Chichewa)
  • Odia (Oriya)
  • Pashto
  • Persian
  • Polish
  • Portuguese (Portugal, Brazil)
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala (Sinhalese)
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tagalog (Filipino)
  • Tajik
  • Tamil
  • Tatar
  • Telugu
  • Thai
  • Turkish
  • Turkmen
  • Ukrainian
  • Urdu
  • Uyghur
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

These cover all the biggest languages of the world, with more than 95% of the Earth’s population speaking at least one of them. Thus, most people in most situations will probably be able to use automated translations for their needs.

That being said, 109 supported languages out of approximately 7000 existing is just 1.5%. A lot of these are minor languages, spoken by populations of about 1000 people, and many of them are endangered. Nevertheless, if you need one of those less popular languages, human translators are your only option.

The fact that these languages are available in translation engines, however, is mostly meaningless, because what really matters is how well the machine is able to work with them, and that depends on how well the language is resourced.

Language resourcing

Language resources are collections of texts (including transcribed speech) in a particular language. In the context of machine translation, language resources are used to train translation engines. The more resources a language has, the better the translations to said language are going to be.

As of 2020, the top 10 most well-resourced languages (apart from English) are:

  1. German
  2. Czech
  3. Japanese
  4. Russian
  5. French
  6. Arabic
  7. Spanish
  8. Icelandic
  9. Latin
  10. Italian

Most of the major European languages, as well as the biggest Asian languages are significantly resourced as well.

Short automated translations to these languages can yield pretty decent results, although translating anything more than a paragraph or two is still very risky. This is especially true for the translation of documents or official text localization, where precision and terminology management are essential. Languages outside of this scope cannot be translated with any degree of accuracy yet.

What it can do

By 2020, Google Translate and other similar engines are good at translating the overall meaning of a text or it’s polar opposite – very specific and isolated terminology.

It is no secret that automated translation can aid greatly in day-to-day conversations with people you don’t share a common language with. Not only that it can convey different aspects of the same idea, i.e. it understands synonyms and is able to deliver a decently rich and varied translation (source: Evaluating Natural Language Generation with BLEURT). So, for trying to understand the general message of a text, automated translation is the way to go.

Alternatively, when translating single words or short phrases, Google Translate almost always offers a variety of possible translations ranging from mundane to terminological. This allows the users to pick, choose and later verify their choice, or at least be aware of the fact that there can be more than one meaning to a word. This make automated translation engines a great helping tool to those, who already have a limited knowledge of a target language, and wish to use additional sources to check if their understanding of a text is correct.

What it cannot do

Another quote from Google summarizes the disadvantages of the translation engines better than anything: Although these are impressive strides forward for a machine, one must remember that, especially for low-resource languages, automatic translation quality is far from perfect. These models still fall prey to typical machine translation errors, including poor performance on particular genres of subject matter (“domains”), conflating different dialects of a language, producing overly literal translations, and poor performance on informal and spoken language.Google, taken from Recent Advances in Google Translate.

Automated translation is inadequate, when it comes to specific topics or the translation requires utmost precision. Medical and legal texts, for example, should never be translated by translation engines. Their respective terminology requires a highly-trained professional translator, or else the legal status, financial well-being or health of a client will certainly be at risk.

Another thing machine translation engine are poor at is understanding cultural context. This ties in with the topic of specific terminology, and is well-demonstrated by how Google tries to correct the gender bias in their translations (source: A Scalable Approach to Reducing Gender Bias in Google Translate). While much progress has been made over the years, it is still apparent, that machine localization cannot actually localize – it does not understand the broader context of the text, because it is something that is often not mentioned in the text. Various cultural peculiarities often constitute the implicit knowledge of the speaker, and thus cannot be deduced by a machine, making the translation culturally inaccurate or inappropriate.

Should you use it?

We think that using an automated translation is permissible in the following cases:

  • Your text is very short, 1 to 3 sentences.
  • You wish to translate a single word or a phrase.
  • You already know the target language, and need to verify your own translation for non-professional purposes.
  • You only need a general understanding of a text and do not care about the details.

We also think that a machine translation should never be used for the following purposes:

  • You need a high-quality translation.
  • You intend to use your translation in any professional context.
  • Your translation requires the most precise terminology management possible, e.g. in legal and medical texts.
  • You need to be sure that the text conveys a specific message rather than translates the source verbatim, e.g. when trying to enter a foreign market.

As you can see, “To use or not to use?” is a question, the answer to which is dependent on your aims, your context and your scope.

  • Orafol
  • Dekra
  • Saint Gobain
  • Accerol Mittal
  • Swepro

Succesful Cooperations

Computer Science & Tech

Computer-Assisted Translation of XML Files

See references

Marketing & Sale

Localization of POS Material in Indesign

See references

Technology

Software Localization for an Online-Shop

See references

Engineering

Translation and Desktop Publishing of Technical Drawings

See references

Next article

8 most spoken languages of Botswana

26.06.2020
Read now