NLP has focused mostly on Indo-European languages, optimizing models on its linguistic features. However, most languages of the world (7000) have been left out, including Indigenous languages of the Americas. A critical phenomenon in many American languages is polysynthesis which clusters most of the information of a sentence into its verb. From a computational point of view, these polysynthetic words can be challenging to handle due to the high sparsity that arises. In this talk, we will visit the morphological segmentation techniques that aim to minimize this sparsity and explore the impact on Machine Translation.
Manuel Mager is a Ph.D. candidate at the University of Stuttgart (Institute for Natural Language Processing), Germany. He graduated in Informatics from the National Autonomous University of Mexico (UNAM) and did a Master in Computer Science at the Metropolitan Autonomous University, Mexico (UAM). His work is focused on Natural Language Processing for low resource languages, morphological analysis and translation of polysynthetic languages, and code-switching. His main aim is to include indigenous languages of the Americas into the current NLP community and democratize the advancements in the field to all languages of the world.