Last week RGCL welcomed Dr Aline Villavicencio from the University of Essex (UK) and the Federal University of Rio Grande do Sul (Brazil). Dr Villavicencio gave a Research Seminar, which was well attended and an interesting discussion after.
Upcoming Seminars can be found on our website here.
Title: Identifying Idiomatic Language with Distributional Semantic Models
Precise natural language understanding requires adequate treatments both of single words and of larger units. However, expressions like compound nouns may display idiomaticity, and while a police car is a car used by the police, a loan shark is not a fish that can be borrowed. Therefore it is important to identify which expressions are idiomatic, and which are not, as the latter can be interpreted from a combination of the meanings of their component words while the former cannot. In this talk I discuss the ability of distributional semantic models (DSMs) to capture idiomaticity in compounds, by means of a large-scale multilingual evaluation of DSMs in French and English. A total of 816 DSMs were constructed in 2,856 evaluations. The results obtained show a high correlation with human judgments about compound idiomaticity (Spearman’s ρ=.82 in one dataset), indicating that these models are able to successfully detect idiomaticity.