The Research Group in Computational Linguistics invites applications for TWO 3-year PhD studentships in the area of translation technology. These two PhD studentships are part of a university investment which also includes the appointment of a reader (the equivalent of associate professor) and a research fellow with the aim to strengthen the existing research undertaken by members of the group in this area. These funded student bursaries consist of a stipend towards living expenses (£14,500 per year) and remission of fees.
Last week, the RGCL and SCRG PhD Students presented their research to their peers and staff members from across the University. The posters were well received.
Statistical Cybermetrics Research Group
David Foster: ‘Determining YouTube Video Popularity: Analysing YouTube User Behaviours’
Kuk Aduku: ‘ Do Patents Cite Conference Papers as Often as Journal Articles in Engineering? An Investigation of Four Fields’
Research Group in Computational Linguistics
Mohammad Alharbu: ‘Readability Assessment for Arabic as a Second Language’
Najah Albaqawi: ‘Gender Variation in Gulf Pidgin Arabic’
This poster is an attempt to provide a quantitative variationist analysis on variability in GPA morpho-syntax (Arabic definiteness markers, Arabic conjunction markers, object or possessive pronoun, GPA copula, and agreement in the verb phrase and the noun phrase) which aims to discover the potential effect of the three factors: male and female gender, speakers’ first language, and number of years spent in the Gulf.
Richard Evans: ‘Sentence Rewriting for Language Processing’
This poster provided an overview of the OB1 sentence simplification system. In this approach, the functions of various textual markers of syntactic complexity (conjunctions, relative pronouns, and punctuation marks) are identified and used to inform an iterative rule-based sentence transformation process.
Ahmed Omer: ‘New Techniques For Finding Authorship in Arabic Texts’
The degree of stylistic difference between a pair of documents can then be found by any of a number of measures which compare the sets of linguistic features for each document. In general, The technique is used to first find a set of linguistic features and a difference measure which successfully discriminates between texts known to be either by author A or author B. Then texts of unknown authorship are compared against these texts to see whether their writing style is more similar to author A or author B.
Omid Rohanian: ‘ NLP Approaches to estimating Text Difficulty’
I am exploring NLP approaches in investigating text difficulty at the level of concepts.
Shiva Taslimpoor: ‘Automatic Extraction and Translation of Multiword Expressions’
We employ the state-of-the-art word embedding approaches to automatically identify and translate idiosyncratic Multiword Expressions.
On Wednesday 7 June, RGCL welcomed Javier Pérez-Guerra from the University of Vigo in Spain. Javier is currently a Visiting Researcher at Linguistics and English Language Department, Lancaster University and we were very pleased that he could spare the time to visit and to give a talk to our Research Group. The talk was well attended and very well received!
TITLE: Coping with markedness in English syntax: on the ordering of complements and adjuncts
This talk examines the forces that trigger two word-order designs in English: (i) object-verb sentences (*?The teacher the student hit) and (ii) adjunct-complement vs. complement-adjunct constructions (He taught yesterday Maths vs He taught Maths yesterday). The study focuses both on the diachronic tendencies observed in the data in Middle English, Early Modern and Late Modern English, and on their synchronic design in Present-Day English. The approach is corpus-based (or even corpus-driven) and the data, representing different periods and text types, are taken from a number of corpora (the Penn-Helsinki Parsed Corpus of Middle English, the Penn-Helsinki Parsed Corpus of Early Modern English, the Penn Parsed Corpus of Modern British English and the British National Corpus, among others). The aim of this talk is to look at the consequences that the placement of major constituents (eg. complements) has for the parsing of phrases in which they occur. I examine whether the data are in keeping with determinants of word order like complements-first (complement plus adjunct) and end-weight in the periods under investigation. Some statistical analyses will help determine the explanatory power of such determinants.