Seminar: Automatic Extraction and Translation of Multiword Expressions


Speaker: Shiva Taslimipoor
Automatic Extraction and Translation of Multiword Expressions
Date and time: Wednesday, March 9th, 2pm
Room: MD083, City Campus

Abstract: Multiword expressions (MWEs) are defined as idiosyncratic interpretations that cross word boundaries or spaces, e.g. frying pan, take a look and take part. They have distinct syntactic and semantic properties that call for special treatment within a computational system. While the computational treatment of MWEs has been a very relevant topic in computational linguistics, only a few approaches have drawn on bilingual resources for the automatic treatment of such expressions. In terms of bilingual resources, comparable corpora have been known to be helpful especially for resource-poor languages; however, their application to actual NLP tasks has been rather limited. Therefore, the automatic extraction and translation of MWEs from comparable corpora, the richest and most obvious resource, is an under-resourced topic. We believe that there is a lot of ground to be covered and new promising results to be achieved in this field. In order to automatically identify MWEs, we use statistical measures to rank expressions in both the English and Spanish languages. The higher their ranking, the more likely the expressions are MWEs. To automatically extract translation equivalents of MWEs across comparable corpora, we propose to extend a state-of-the-art distributional similarity method (which is called word embedding) to use contexts in a bilingual space to find bilingual similarities between expressions from different languages. As for the work in progress, we propose to use translation equivalents of MWEs to improve the performance of statistical measures in their identification.