Last week RGCL welcomed Dr Aline Villavicencio from the University of Essex (UK) and the Federal University of Rio Grande do Sul (Brazil). Dr Villavicencio gave a Research Seminar, which was well attended and an interesting discussion after.
Upcoming Seminars can be found on our website here.
Title: Identifying Idiomatic Language with Distributional Semantic Models
Abstract:
Precise natural language understanding requires adequate treatments both of single words and of larger units. However, expressions like compound nouns may display idiomaticity, and while a police car is a car used by the police, a loan shark is not a fish that can be borrowed. Therefore it is important to identify which expressions are idiomatic, and which are not, as the latter can be interpreted from a combination of the meanings of their component words while the former cannot. In this talk I discuss the ability of distributional semantic models (DSMs) to capture idiomaticity in compounds, by means of a large-scale multilingual evaluation of DSMs in French and English. A total of 816 DSMs were constructed in 2,856 evaluations. The results obtained show a high correlation with human judgments about compound idiomaticity (Spearman’s ρ=.82 in one dataset), indicating that these models are able to successfully detect idiomaticity.