Author Archives: riilp

Dr. Aline Villavicencio to visit from the University of Essex

Dr. Aline Villavicencio from the University of Essex (UK) and Federal university of Rio Grande do Sul (Brazil) is visiting RGCL in April. She will be giving a talk on Identifying Idiomatic Language with Distributional Semantic Models on the 19th April 2018, abstract below. If you are interested in attending the talk please contact for more details.

Identifying Idiomatic Language with Distributional Semantic Models
Precise natural language understanding requires adequate treatments both of single words and of larger units. However, expressions like compound nouns may display idiomaticity, and while a police car is a car used by the police, a loan shark is not a fish that can be borrowed. Therefore it is important to identify which expressions are idiomatic, and which are not, as the latter can be interpreted from a combination of the meanings of their component words while the former cannot. In this talk I discuss the ability of distributional semantic models (DSMs) to capture idiomaticity in compounds, by means of a large-scale multilingual evaluation of DSMs in French and English. A total of 816 DSMs were constructed in 2,856 evaluations. The results obtained show a high correlation with human judgments about compound idiomaticity  (Spearman’s ρ=.82 in one dataset), indicating that these models are able to successfully detect idiomaticity.

RGCL Staff Research Seminar

In November, Dr Constantin Orasan gave a staff research seminar profiling his current and future research user study on Quality estimation for professional translators.  The paper was well received and there was an interesting debate and questions afterwards.

Title:  Quality estimation for professional translators: a user study


Postediting of machine translation output has became an important step of the workflows employed by translation companies. The idea behind postediting is that it is possible to improve the productivity of professional translators by asking them to correct the output of machine translation systems rather than to translate from scratch. In cases in which the quality of translation is poor this is not necessary true. The field of quality estimation could prove useful to decide which sentences can be postedited and which should be translated from scratch. This talk will report the results of a user study which recorded the productivity of four professional translators when they were asked to postedit and translate sentences in different scenarios.

Our results show that quality estimation information, when accurate, improves post-editing efficiency. The analysis has also raised a number of questions which are worth being investigated.

RGCL welcomes Johanna Monti

In the middle of November, RGCL welcomed Johanna Monti, an Associate Professor of Modern Languages Teaching at the “L’Orientale”University of Naples. Her research activities are in the field of hybrid approaches to Machine Translation and NLP applications.  Whilst Johann was here, she gave two lectures on Multi-word Expressions and Gender Issues in Machine Translation.  The lectures were well received and also attended by the Research Group’s MA students.

TITLE: Parseme-It Corpus: An annotated Corpus of Verbal Multiword Expressions in Italian

ABSTRACT:  This talk outlines the development of a new language resource for Italian, namely the PARSEME-It VMWE corpus, annotated with Italian MWEs of a particular class: verbal multiword expressions (VMWE). The PARSEME-It VMWE corpus has been developed by the PARSEME-IT research group in the framework of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (Savary et al., 2017), a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for verbal multiword expressions in 18 languages, among which also the Italian language is represented. Notably, multiword expressions represent a difficult lexical construction to identify, model and treat by Natural Language Processing (NLP) tools, such as parsers, machine translation engines among others, mainly due to their non-compositional property. In particular, among multiword expressions verbal ones are particularly challenging because they have different syntactic structures (prendere una decisione ’make a decision’, decisioni prese precedentemente ’decisions made previously’), may be continuous and discontinuous (andare e venire versus andare in malora in Luigi ha fatto andare la societ`a in malora), may have a literal and figurative meaning (abboccare all’amo ’bite the hook’ or ’be deceived’). The talk will  describe the state of the art in VMWE annotation and identification for the Italian language, the methodology, the Italian VMWE categories taken into account for the annotation task, the corpus and the annotation process  and the results.


TITLE: Gender Issues in Machine Translation

ABSTRACT:  Machine Translation is one of most widely used Artificial Intelligence applications on the Internet: it is so widespread in online services of various types that sometimes users do not realize that they are using the results of an automatic translation process- In spite of the remarkable progress achieved in this field over the last twenty years thanks to the enhanced calculating capacity of computers and advanced technologies in the field of Natural Language Processing (NLP), machine translation systems, even the most widely used ones on the net such as for example Google Translate, have high error rates.. Among the most frequent problems in the state-of-the-art MT systems, either based on linguistic data like Systran, statistical approaches like Google Translation or the recent neural approach, translation of gender still represents a recurrent source of mistranslations: incorrect gender attribution to proforms (personal pronouns, relative pronouns, among others), reproduction of gender stereotypes and overuse of male pronouns are among the most frequent problems in MT.