Hello, my name is Richard Evans. I’m employed as a research fellow and am currently undertaking a part-time PhD at the University of Wolverhampton. I first joined the research group in 1998 having obtained BA (Hons) Linguistics from the University of Wales (Bangor) and an MSc in Cognitive Science and Natural Language at the University of Edinburgh. I love the challenges of computational linguistics and natural language processing, and the creativity that those challenges inspire.
Since my appointment, my main research interests have been in anaphora resolution (where programs try to work out which phrases in a text the pronouns get their meaning from) and information extraction (where programs automatically identify and tabulate interesting facts in a text). While developing an information extraction system, I realised that a lot of the errors that it made were caused by its poor handling of long sentences that contain large numbers of facts. This led to my current work in automatic sentence rewriting for language processing. In this context:
- Sentence rewriting is the automatic conversion of long complex sentences into sequences of short simple sentences;
- Language processing includes cognitive language processing by people and automatic language processing by machines in applications such as machine translation and information extraction.
From the human perspective, evaluation of automatic sentence rewriting systems is difficult because it’s hard to see, from previous research, a clear link between real human text comprehension and automatic methods to estimate text readability. Many metrics have been developed to estimate readability but it’s not clear how good they are in predicting how easy a text is for people to understand.
In 2011-2014, I worked on FIRST a project to help carers to convert texts into a more accessible form for people with autism spectrum disorder. We proposed several new readability measures to predict how easy a text would be for people with ASD to understand. Unfortunately, those new measures are subject to the same criticism as the other readability metrics.
My colleague, Victoria Yaneva, has done several experiments using eye tracking and reading comprehension testing to assess the ease with which various texts can be understood by people with autism and by neurotypical controls. I am currently collaborating with her to assess the correlation between several readability metrics and the real world reading comprehension of texts evaluated using those metrics. If we can find metrics that are closely correlated with reading comprehension, then it may be possible to use those metrics to predict the ease of reading comprehension of previously unseen texts. This type of information is important when developing systems for automatic sentence rewriting, because it enables researchers to predict the real-world ease of human reading comprehension of the texts generated by their systems. It will also enable them to automatically detect which texts require some rewriting and which texts are easy enough to understand in their original form. We’re going to submit a paper about this to the 2015 conference on Recent Advances in Natural Language Processing (RANLP). If it’s accepted we’ll go Bulgaria to present our work.
Psycholinguistic studies have shown that reading comprehension depends on the number of ideas and facts “propositions” contained in the sentences of a text. Multi-propositional sentences are more difficult to understand than others. They often contain various items such as conjunctions (e.g. “and”, “or”), complementisers (e.g. “that”, “which”), and punctuation marks (e.g. “,”, “;”). Today, I’m checking the progress of a program I developed to identify the grammatical functions of these signs of syntactic complexity. It will give information such as “OK, this comma here is introducing a relative clause” or “this conjunction “and” is linking two clauses together). This information can be used to help the sentence rewriting program I’m developing to decide exactly how to rewrite some long complex sentence as several short, easy-to-understand sentences.
Inspired by my research in information extraction, I’m interested in finding ways to evaluate the benefits that my sentence rewriting program can bring to other NLP tasks. With my colleagues Dr Constantin Orasan and Hanna Bechara, I’m planning to see whether automatic machine translation of the kind done by Google Translate is more accurate when translating automatically rewritten sentences and to see whether sentence rewriting has any role to play in the automatic evaluation of machine translation systems.
This year, I began work on the CAPTNS project funded by the US National Board of Medical Examiners, who develop exam questions for doctors. CAPTNS is concerned with the automatic scoring of medical notes written by doctors in their training. The type of language used in these hastily written notes is quite far removed from the carefully edited texts that we usually process in our work. We hypothesise that some type of sentence rewriting can improve the accuracy of our scoring program. I’m currently working to apply the know-how gained through my previous research toward accurate rewriting of the sentences in doctors’ notes.
I’m very much enjoying my time in the research group. The work is hard but the supervision is very supportive and there is a lot of freedom for staff and students to find innovative solutions to the challenges that we face.