The last years witnessed an increasing interest in the automatic methods for cleaning translation memories. This problem presents a great interest to the industry as many translation memories have not been adequately curated and thus include incorrect translations. We argue that progress in TM cleaning tools should be based on the translators-oriented surveys to understand better what constitutes a good TM unit.
In this talk, I will define the task, discuss its importance for the translation industry and outline the progress made in the last years for providing an automatic solution to this problem. A closely related line of research deals with identifying sentences that do not align in the parallel corpora mined from the web. The similarities between these tasks will be examined. I will argue for sharing tools and algorithms between the two research communities.
The presenter will offer his insights about the pitfalls and the opportunities for collaboration between Academia and the industry based on his experience in Translated company.
Eduard Barbu is a researcher in the language technology group at the University of Tartu. He has a Ph.D. in cognitive science from the University of Trento, Italy (2010). Currently, his research interests are in interpreting machine learning output and information extraction.
Eduard has worked in both Academia and industry in four countries: Romania, Italy, Spain, and Estonia. He has experience in many areas of NLP, like ontology building, information extraction, and coreference resolution. Eduard has worked on seven European projects and several national projects having to do with technology transfer. He is the author of NLP tools like the Estonian Coreference System, a coreference system for the Estonian Language, and TM Cleaner, a tool for cleaning translation memories.