In order to be reasonably effective, machine translation requires an enormous parallel corpus for each language. Machine translation engines use parallel corpora to figure out regular correspondences between languages: if "regering" or "κυβέρνηση" or "kormány" or "vláda" all frequently appear in parallel to "government," then the machine concludes these words are equivalent. ![]() Human-translated documents make a great base for what linguists call a parallel corpus - a large mass of text that's equivalent, sentence-by-sentence, in multiple languages. Part of the reason is that Greek, Czech, Hungarian, and Swedish are among the 24 official languages of the European Union, which means that a small hoard of human translators translate many official European Parliament documents every year. ![]() Why do Greek, Czech, Hungarian, and Swedish, with their 8 to 13 million speakers, have Google Translate support and robust Wikipedia presences, while languages the same size or larger, like Bhojpuri (51 million), Fula (24 million), Sylheti (11 million), Quechua (9 million), and Kirundi (9 million) languish in technological obscurity? And Oromo, a language spoken by some 34 million people, mostly in Ethiopia, which has just 772 articles in its Wikipedia. But there’s also Odia, the official language of the Odisha state in India, with 38 million speakers, which has no presence in Google Translate. There’s Swedish, which has 9.6 million speakers, the third-largest Wikipedia with over 3 million articles, and support in Google Translate, Bing Translate, Facebook, Siri, YouTube captions, and so on. These midsize languages are still fairly widely spoken, but they have vastly inconsistent levels of support online. Her book Because Internet: Understanding the New Rules of Language is due out in July 2019 from Penguin.īut in the murky middle ground are a couple hundred languages that are spoken by speakers in millions. She's the cocreator of Lingthusiasm, a podcast that's enthusiastic about linguistics. Gretchen McCulloch is WIRED's resident linguist.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |