Possible PhD Topics in RGCL

This page lists possible PhD topics suggested by members of staff in RGCL. These topics are intended to give PhD applicants an idea of the scope of the work in the Institute.

In addition you can read about current PhD research and past PhD theses.

As part of their PhD application, applicants have to submit a PhD proposal outlining their intended area of research. This could be based on one of the topics on this page, although the proposal itself needs to be much more detailed. Applicants are also very welcome to suggest their own topics. In both cases, they should contact the potential supervisor before submitting an application.

Please note that there is no specific funding attached to any of the suggested topics. However, students in receipt of scholarships or self-funded students with a strong academic background and programming skills are eligible to apply.


Cross-Lingual Information Access to Web Content

Supervisors: Prof. Gloria Corpas-Pastor / Prof. Ruslan Mitkov

PhD proposals investigating the following areas are welcome:

  • Discovering effective ways of accessing web content in different languages and retrieving different genres and topics.
  • Studying different methods for cross-lingual information access (CLIA).
  • Identifying professional needs and requirements with respect to cross-lingual information retrieval.
  • Studying multilingual document summarization techniques to make foreign language documents accessible to the user.

Automated content-based scoring of essays

Supervisor: Dr Le An Ha

Automated content-based scoring of free texts such as student essays is not intended to replace human scorers. Instead, it will serve as a quality control tool, and help to improve the scoring reliability and accuracy. PhD proposals are invited that will focus on exploiting NLP technologies, including text segmentation, and text similarity to detect appropriate content in texts and score them accordingly.


Automatic question generation from multimedia sources

Supervisor: Dr Le An Ha

In the context of learning, questions (and answers) can serve different purposes. They can be used to attract curiosity from learners, motivate them to find the answers, or they can be used as an assessment tool. Authoring good questions is time-consuming, however. We welcome PhD proposals that will investigate novel ways to generate high quality questions using a variety of NLP techniques, and combine them with computer vision technologies to generate questions that are challenging content-wise, well-formed linguistically, and visually appealing.


Computational Lexicography

Supervisors: Prof. Patrick Hanks, Dr Sara Moze

“Many, if not most, meanings depend for their normal realization on the presence of more than one word.” – Prof. J.M. Sinclair (1998)

Proposals for PhD topics are invited that will shed light on the relationship between phraseology and meaning.  Recent corpus-driven research has shown that traditional methods in lexicography (e.g. collection of citations by human readers, or relying on introspection as a source of evidence) can result in failure to capture relevant generalizations. As a result, a great variety of topics in lexicography are in need of empirical investigation, using computational methods such as statistical analysis of corpus evidence. New text-processing tools for identifying and processing meaning need to be developed.  Existing resources, including our Pattern Dictionary of English Verbs (www.pdev.org.uk; work in progress), need to be verified and expanded in new directions. Corpus Pattern Analysis (CPA) needs to be applied to nouns and adjectives as well as verbs, requiring development of new analytic procedures and frameworks, with the possibility of surprising and exciting new insights into the nature of meaning in language.

Possible research topics include analysing selected classes of verbs (for example, light verbs; verbs of perception; reporting verbs; domain-specific verb senses, etc.), CPA of adjectives,  CPA of nouns, analysis of the semantic types used in CPA,  NLP methods for pattern extraction, and NLP applications of CPA products such as PDEV.

We are also interested in applying CPA to languages other than English.


Computational Analysis of Figurative Language and other Exploitations

Supervisors: Prof. Patrick Hanks, Dr Sara Moze

Figurative language is an important theme in cognitive linguistics, but comparatively little has been done by way of computational and corpus-driven analysis. Proposal are invited for PhD research involving topics such as empirical analysis of figurative language (including cross-linguistic analysis) ; creative vs. conventional metaphor; the limits of creativity; the structure and function of similes; the semantics of anomalous arguments; and/or other aspects of semantic resonance.


Computational Analysis of Syntactic Alternations

Supervisor: Prof. Patrick Hanks

Syntactic alternations of the kind proposed by Beth Levin (1992), a generative linguist, include causative/inchoative, resultative, conative, reciprocal, and many others. Hanks (2013) is one of several authors who have drawn attention to problems with Levin’s approach to this important topic. Proposals are invited for PhD research involving empirical, corpus-driven analysis of syntactic alternations in English. This could also include cross-linguistic studies.


Corpus-driven meaning analysis: norms and exploitations

Supervisor: Prof. Patrick Hanks

Proposals are invited for PhD research into any corpus-drive (or text-driven) aspect of the Theory of Norms and Exploitations (TNE; see Hanks, 2013). The aim is to shed light on how people use words to make meanings. In this theory, meanings are regarded as dynamic events — interpersonal interactions between speaker and hearers or (with displacement in time) writer and readers. TNE is corpus-driven, building, for the interpretation of text meaning, on the disparate foundations laid by Wittgenstein, Ogden and Richards, Rosch, Putnam, Firth, Sinclair, and others. There is a need to extend this work in many directions, including analysis of collocations, syntagmatics, intertextuality, metaphor, and the emerging new conventions of communications in social media, among many other possibilities.


Identifying and Reasoning with Temporal Information

Supervisor: Dr Georgiana Marsic

Prospective students are invited to pursue PhD topics related to temporal information identification and/or its use in applications such as Information Retrieval or Question Answering. Potential PhD proposals could focus on event recognition/detection and typing, as well as on the task of temporal relation identification and classification. The projects could also investigate the use of temporal information in specific applications or to address different user needs.


Semantic Search

Supervisor: Dr Georgiana Marsic

Semantic search is a type of information retrieval that aims to identify information relevant to a user query by analysing deeper than just at the keyword-level both the user query and the data collection used in the search process. It is heavily reliant on language semantics, as well as on the user’s intent and context. The use of different Natural Language Processing techniques to understand contextual meaning of words and to derive concepts from words would constitute a very interesting and cutting edge research direction, and we would welcome PhD proposals focusing on this area.


(Multilingual) Cyberbullying detection in social networks

Supervisor: Dr Vinita Nahar

Cyberbullying is becoming an urgent security concern, affecting millions of people – particularly teenagers – in their early explorations in cyberspace. Due to the dynamic nature of streaming data and information overloading, traditional computing technologies are lagging behind the urgent need for real-time cyberbullying detection in social networks. We would welcome PhD research that aims to exploit NLP technologies to provide better linguistic understanding of individuals, and to build language models of suspected individuals’ (victims, bullies and bystanders) in social networks.


Author profiling in social networks

Supervisor: Dr Vinita Nahar

People often leave anonymous reviews, blogs, or messages giving their opinion on a particular product, movie, or political campaign. For many organisations, it is useful to determine the age and gender of people who like or dislike their products for marketing purposes. Similarly, in the field of forensic crime investigation, knowing the profile information (age and gender) of the author of threatening or anonymous messages, for example, is extremely important. However, this information is usually unavailable in the public domain due to privacy reasons. PhD proposals are invited that aim to develop users’ age and gender prediction models in social networks by using NLP applications and techniques to explore linguistic knowledge.


Sentiment analysis in social networks

Supervisor: Dr Vinita Nahar

Sentiment analysis is a process of identifying opinion from a given opinionated document into positive, negative, and neutral categories. Beyond identifying these categories, more fine-grained emotions (e.g. anger, joy, sadness, surprise, disgust and fear) are required for context reasoning and decision-making. We invite PhD proposals that investigate NLP methods, and combine them with temporal analysis, and topic analysis to capture dynamics of user behaviour and emotions with precision.


Sarcasm detection in Twitter

Supervisor: Dr Vinita Nahar

Sarcasm is a way of expressing a – normally – negative, harsh or bitter message in a positive way. Sarcasm detection poses a major challenge in linguistic feature extraction and selection in short-text messages, such as determining whether the phrase “that’s great, isn’t!!” is meant sincerely or sarcastically. PhD proposals are invited to develop user behavioural patterns and models using different NLP techniques to capture sarcastic remarks in Twitter.


(Multilingual) Cyber terrorism detection

Supervisor: Dr Vinita Nahar

Radicalisation often leads to violent acts, such as cyber terrorism, which have a hugely negative impact on society. It is challenging to spot suspected activities relating to cyberterrorism quickly enough to act. There is scope to investigate how natural language processing, machine learning, social graph mining and visualisation can quickly locate suspected contents or traces relating to terrorist activities and trace the location of suspected individuals or groups, and we would welcome PhD proposals focusing on this area.


Stylometry of Wittgenstein

Supervisor: Dr Michael Oakes

Computational Stylometry is the study, using computers, of writing style, and PhD proposals are invited that investigate this topic. Working with the Wittgenstein Archives in Bergen, techniques of computational stylometry will be used to: classify Wittgenstein’s writings by theme and genre; estimate most likely dates of his undated works; and determine the relative contributions of Wittgenstein and other writers from the “Vienna Circle” to writings which were dictated, jointly written, or of uncertain authorship.


Applying authorship attribution techniques to the problem of plagiarism

Supervisor: Dr Michael Oakes

A PhD project in this area could focus on techniques from authorship attribution studies that create profiles of individual writers, and quantify aspects of their writing styles. Newly submitted essays will be compared against these profiles rather than a database of previously submitted essays, to flag up those cases where the writing style of a new essay does not match the style of its supposed author. We will achieve this by identifying features that can capture individual writing traits. By producing metrics which determine the degree of similarity between texts, we will derive scores based on comparing new texts with author profiles, and return a score that reflects whether the new text is coherent with the writing style of the author.


Corpus-Based Translation Studies (CBTS)

Supervisor: Dr Michael Oakes and Prof Gloria Corpas

Corpora are collections of texts stored electronically for linguistic analysis. Recently there has been interest in corpora of original texts and their translations, and quantitative methods of comparing them. Due to the growing interest in the use of corpus material and methodologies in translation research, there is a need for the adaption of the various statistical tests used in corpus linguistics in general for the purpose of translation research. The development of quantitative analytical methods in CBTS will help in the construction and testing of theoretical models for literary translations, and enable the expansion of the field of translation studies as a whole.


(Multilingual) Text Simplification and Text Summarisation

Supervisor: Dr Constantin Orasan

Students are invited to pursue PhD topics related to text summarisation and language simplification. The fields of language simplification and text summarisation have developed largely independently, but they can benefit of each other a lot. Possible topics of PhDs are:

  • how to use methods from text simplification to produce automatic abstracts and compress sentences
  • propose methods which consider the structure of the discourse during the simplification process
  • investigate how the existing methods for language simplification and text summarisation can be adapted to other languages
  • develop methods for language simplification and text summarisation for specific applications and target users.

Semantically enhanced translation memories

Supervisor: Dr Constantin Orasan

Recent research (Gupta et al. 2015) has shown that paraphrasing can improve the retrieval of segments from translation memories and speed up the translation process. We invite PhD proposals on how to use other types semantic information (e.g. named entities, word sense disambiguation, semantic role labelling, terminologies, etc.) to improve the performance of translation memories.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.