Congratulations to Shiva Taslimipoor who successfully defended her thesis, entitled ‘Automatic Identification and Translation of Multiword Expressions’, on Tuesday. She is pictured (left-right) with Professor Dew Harrison (Chair of the viva), Dr Aline Villavicencio (External Examiner), Professor Mike Thelwall (Internal Examiner) and Professor Ruslan Mitkov (Director of Studies). We are all thrilled for Shiva and wish her the very best for her next venture!
Next week, Shiva Taslimipoor is to defend her thesis in her viva voce which will conclude her four year PhD with the Research Group in Computational Linguistics. In the run up to her viva, Shiva presented her thesis and the research she has undertaken to the group.
Title: Automatic Identification and Translation of Multiword Expressions
Abstract: Multiword Expressions (MWEs) belong to a class of phraseological phenomena that is ubiquitous in the study of language. They are heterogeneous lexical items consisting of more than one word and feature lexical, syntactic, semantic and pragmatic idiosyncrasies. Scholarly research in MWEs immensely benefit both natural language processing (NLP) applications and end users. Along with the improvement of general NLP techniques, the methodologies to deal with MWEs should be improved.
This thesis involves designing new methodologies to identify and translate MWEs. In order to deal with MWE identification, we first develop datasets of annotated verb-noun MWEs in context. We then propose a method which employs word embeddings to disambiguate between literal and idiomatic usages of expressions. Existence of expression types with various idiomatic and literal distributions leads us to re-examine their modelling and evaluation.
We propose a type-aware train and test splitting approach to prevent models from overfitting and avoid misleading evaluation results.
Identification of MWEs in context can be modelled with tagging methodologies. To this end, we devise a new neural network architecture, which is a combination of convolutional neural networks and long-short term memories with an optional conditional random field layer on top. We conduct extensive evaluations on several languages demonstrating a better performance compared to the state-of-the-art systems. Experiments show that the generalisation power of the model in predicting unseen MWEs is outstanding.
In order to find translations for verb-noun MWEs, we propose a bilingual distributional similarity approach derived from a word embedding model that supports arbitrary context. The technique is devised to extract translation equivalents from comparable corpora which are an alternative resource to costly parallel corpora. We finally conduct a series of experiments to investigate the effects of size and quality of comparable corpora on automatic extraction of translation equivalents.
The 2nd Conference on Recent Advances in Artificial Intelligence (RAAI) took place on June 25-26 in Buchrest, Romania. It was organized by Prof. Liviu P. Dinu and colleagues from the Faculty of Mathematics and Computer Science of the University of Bucharest.
The conference lasted two days. The first day focused on Natural Language Processing with three invited speakers: Cornelia Caragea from Kansas State University, Marius Pasca from Google, and Marcos Zampieri from the University of Wolverhampton, as well as several presenters from Romania and from abroad. The second day featured presentations on computer vision and other areas of A.I. including a panel discussion with researchers and developers from local A.I. companies such as Bitdefender.
Marcos’ presentation entitled “Automatic Language Identification: A Solved Task? ModellingDialectal Variation in Language Identification Systems” provided an overview of the main challenges in language identification with special focus on dialectal variation taking the lessons learned in the five years of the VarDial workshop into account.
Last week, the RGCL PhD Students presented their research to their peers and staff members from across the University. The posters were well received.
Richard Evans: ‘Sentence Simplification for Language Processing’
My research is about the development and evaluation of automatic methods for the analysis and simplification of sentences. The analysis step is shallow, making it efficient and robust when processing long complex sentences. The simplification method is iterative, allowing it to simplify sentences containing multiple occurrence and multiple types of complexity.
Ahmed Omer: ‘Arabic Stylometry’
Computational Stylometry is the computer analysis of writing style. Successful techniques for computational stylometry characterise the texts under study by large numbers of linguistic features, such as the frequencies of word, character, or sentence length.
The degree of stylistic difference between a pair of documents can then be found by any of a number of measures which compare the sets of linguistic features for each document.
Omid Rohanian: ‘ NLP Approaches to estimating Text Difficulty’
I am exploring NLP approaches in investigating text difficulty at the level of concepts. I regard conceptual difficulties, as linguistic phenomena that cause some form of complication in language understanding. This complication can manifest itself in elongation of processing, which could be captured in eye tracking data, or in the form of misunderstanding the intended meaning. Conceptual difficulties alter literal meaning, and in order to comprehend them, one might need to do additional processing.
If you are interested in pursuing a PhD with the Research Group in Computational Linguistics, please find further information on our Master and PhD studies page.
RGCL would like to congratulate two members of staff – Dr Sara Moze and Dr Victoria Yaneava who have both been nominated for a VC Awards for Staff Excellence.
This nomination resulted from the University’s recent student surveys in the Innovation in Student Engagement category! The question asked in the survey was as follows:
“Could you tell us about an individual or team who has had a positive impact on your learning experience? This could include through creative and stimulating teaching, learning and assessment methods”
We look forward to the shortlist being announced.
Dr Victoria Yaneva recently attended the 15th Web for All Conference and presented the co-authored paper ‘Detecting Autism Based on Eye-Tracking Data from Web Searching Tasks’. The paper was awarded the Best Technical Paper – we would like to congratulate Dr Yaneva and her co-authors Dr Le An Ha, Dr Sukru, Dr Yeliz Yesilada and Professor Mitkov.