RGCL Welcomes Lut Colman

Last week Lut Colman visited RGCL from the Instituut voor de Nederlandse Taal, Leiden (INT).

The main objective of Lut’s visit was to gain a deeper understanding of Corpus Pattern Analysis (CPA), a corpus-driven technique developed by Prof. Hanks and implemented in the Pattern Dictionary of English Verbs (PDEV), and to test the lexicographic tools used for PDEV in order to establish whether or not they are suitable for her Dutch pilot project.  Whilst Lut was here, she gave a talk on her upcoming research project.

Title: Dutch Verb Patterns Online: A Collocation and Pattern Dictionary of Dutch Verbs


Dutch Verb Patterns Online is a project to be developed at the Dutch Language Institute (INT) in Leiden. A pilot will consist of a collocation and pattern dictionary of a selection of verbs for advanced learners of Dutch as a second language. For that purpose, the institute will form a consortium with two partners who have expertise in developing e-learning material for language learners.

The aim of the project is a database and web application with information sections on verbs for language learners:

1) collocations: semi-fixed lexical combinations and fixed grammatical collocations that need not be defined, such as een fout {maken, begaan} (make a mistake), vertouwen op (rely on), etc.

2) idioms: expressions that have to be defined because the meaning is opaque, such as de strijdbijl begraven (bury the hatchet)

3) GDEX-examples. GDEX stands for good dictionary examples: short, representative and illustrative example sentences from a corpus

4) verb patterns: semantically motivated pieces of phraseology in which the valency slots of the verb are occupied by arguments of a particular semantic type (e.g. human, location). Semantic types are realized by lexical sets: lists of words and phrases that occur as collocates. Each pattern corresponds to a meaning. Patterns are identified by means of Corpus Pattern Analysis (CPA), a lexicographical technique used by Patrick Hanks in the Pattern Dictionary of English Verbs, PDEV (http://pdev.org.uk/ ) and based on his Theory of Norms and Exploitations (Hanks 2013).

The Dutch project wants to combine a pattern dictionary and a collocation application like SketchEngine for Language Learners (SkeLL)(Baisa & Suchomel, n.d.). The SkeLL can be developed for Dutch before we get started with the more labour-intensive pattern descriptions. Eventually, both functionalities can be merged and included as a plug-in resource in the language material for second language learners. Students will not only have access to patterns or collocation lists separately, but will be able to see which collocations fill in a semantic type in a pattern.


Baisa, V., & Suchomel, V. (n.d.). SkELL: Web Interface for English Language Learning.

Hanks, P. (2013). Lexical Analysis. Norms and Exploitations. MIT Press.


RGCL welcomes Ximena Gutierrez-Vasques

Ximena Gutierrez-Vasques is currently visiting the Research Group in Computational Linguistics from the National Autonomous University of Mexico to collaborate with members of the group. On the 25th April, Ximena presented the group with a talk about her subject area.

Title: Bilingual lexicon extraction for a low-resource language pair


Bilingual lexicon extraction is the task of obtaining a list of word pairs deemed to be word-level translations. This has been a NLP active area of research for several years, especially with the availability of big amounts of parallel, comparable and monolingual corpora that allow us to model the relations between the lexical units of two languages.

However, the complexity of this task increases when we deal with typologically different languages where little data is available.

We focus on the language pair Spanish-Nahuatl. These two languages are spoken in the same country (Mexico) but they are distant from each other, they belong to different linguistic families: Indo-European and Uto-Aztecan. Nahuatl is an indigenous language with around 1.5M speakers and it is a language with a scarcity of monolingual and parallel corpora.

Our work comprises the construction of the first digital publically available parallel corpus for this language pair. Moreover, we explore the combination of several language features and statistical methods to estimate the bilingual word correspondences.

Welcome to Prof. Mikel Forcada

On Wednesday 6th April, RGCL were very pleased to welcome Prof. Mikel Forcada from the University of Alicante, Spain. Mikel is currently undertaking a sabbatical in England and we were very pleased that he could spare the time to visit and to give a talk to our Research Group. The talk, about translation technologies, was well attended and very well received!

Title: Towards effort-driven combination of translation technologies in computer-aided translation


The talk puts forward a general framework for the measurement and estimation of professional translation effort in computer-aided translation. It then outlines the application of this framework to optimize and seamlessly combine available translation technologies (machine translation, translation memory, etc.) in a principled manner to reduce professional translation effort. Finally, it shows some results that point out at existing challenges, particularly as regards to machine translation.

Jobs in translation technology at the Research Group in Computational Linguistics

The Research Group in Computational Linguistics at the University of Wolverhampton is currently recruiting a Reader in Translation Technology (permanent) and a Research Fellow in Translation Technology (3 year position with the possibility of extension). The purpose of these posts is to  strengthen the research group by enhancing its research and publications in the field of translation technology. The appointed candidates will be expected to produce REF-returnable outputs, attract external income, seek industrial collaborations, teach at Masters level and supervise PhD students. Continue reading

RGCL welcomes Eveline Wandl-Vogt

It was a great privilege to welcome Eveline Wandl-Vogt from the Austrian Academy of Sciences to RGCL this week.  Eveline is a Research Manager from the Lexicography Laboratory at the Academy who came to RGCL to discuss possible future collaborations with members of the Research Group. During her stay, Eveline carried out a seminar on her research for members of the group.

Title: Computational Linguistics and Digital Humanities- Designing Joint Discovery on the example of lexicography laboratory @ ACDH @ AAS

Abstract: Continue reading

AUTOR makes the front page

AUTOR has made the front page of University of Wolverhampton’s new research newsletter ‘RESEARCH MATTERS’ – we are delighted and honoured.

The newsletter celebrates research success and opportunities at University of Wolverhampton.

For anyone wanting to know more about AUTOR or how you can get involved in this great research contact Dr Victoria Yaneva either via telephone on 01902 321630 or email v.yaneva@wlv.ac.uk.

AUTOR’s development can be followed at: www.autor4autism.com.

Syntactic complexity sign tagger demo released

The successfully completed FIRST project has developed various components which help users to analyse the complexity of texts and rewrite texts in order to make them more accessible for readers with Autistic Spectrum Disorder (ASD). These components were integrated in the OpenBook tool, but they cannot be used in isolation. In an attempt to make some of this technology available for other researchers, we started a process of releasing some of the components individually. The first component to be released as a web demo is the syntactic
complexity sign tagger. This is a tool that assigns words and punctuation marks from a predefined set to categories indicating their syntactic linking and bounding functions. Some of these categories are used by our sentence rewriting algorithm. Continue reading