A day in the life of…Dr Michael Oakes

Michael Oakes

Hello. I am Michael Oakes, and I have been a Reader in the Research Group in Computational Linguistics for about a year and a half. Previously I spent 13 years at the University of Sunderland, teaching computing in general, so now it feels exciting to be in a group dedicated specifically to Natural Language Processing. After starting here, I took a few more months to finish my book “Literary Detective Work on the Computer”. The book started way back in 2008, when Prof. Mitkov suggested that I write a book for the book series he edits for the John Benjamins Publishing Company. The book was to be centred around computational stylometry, the computer analysis of writing style. He suggested that studies of disputed authorship, plagiarism and spam (unwarranted email campaigns) should considered together, partly because they often uncover fraudulent behaviour, but also because they all consider the question of where a text originally came from, and how similar one text is to another.

I started work in the summer of 2008, while visiting family in Hong Kong. The Hong Kong University library were kind enough to let me use their facilities, and I began with the chapter on Shakespeare. The chapter takes the standpoint that all of Shakespeare’s most famous plays were indeed written by Shakespeare, so the focus is on the so-called Shakespeare “apocrypha” – plays for which there is some historical evidence to suggest that Shakespeare might have had a hand in their composition. The idea was to show how computers have been able to indicate the extent of Shakespeare’s contribution in each case. Originally inspired by my colleague at Sunderland, the late Harry Erwin, who was interested in whether parts of the book of John might have come from an earlier source called the “Signs Gospel”, the fourth chapter looked at computer analyses of religious texts. The findings largely agreed with the beliefs of modern theologians: Luke wrote Acts, the Gospel of John is stylistically distinct from the Apocalypse, and Paul is most likely to have written his four main letters, the “Hauptbriefe”. Our own experiments, and the experiments of others, are inconclusive on the questions of the “Signs Gospel” and the possible existence of an early source of the gospels called “Q”. The majority of computer studies on religious texts are concerned with the New Testament, but new work is also starting to emerge on the Book of Mormon and the Qu’ran. In the final chapter, I consider the decipherment of lost languages. In some respects, computer techniques can only scratch the surface of this task, and there are even difficulties in showing that a set of ancient symbols even constitutes language. However, the mathematics behind these techniques, and what these techniques do show us, are of considerable interest in themselves. The most extensive case studies in the chapter are the Rongorongo writings of Easter Island and the Indus seals of Northern India and Pakistan. Coming full cycle, the first two chapters were written last. See https://benjamins.com/#catalog/books/nlp.12/main

We have recently validated a new course called MA in Language and Information Processing, which we hope will build on the success of the previous MA in Natural Language Processing and Human Language Technologies. I just arrived at Wolverhampton to teach six afternoon sessions on the old course. The new course has two new modules, one of which is Machine Learning, and also the one I will be teaching, Corpus Linguistics with R. A corpus is a large amount of text stored on the computer for statistical and linguistic analysis, while R is a computer programming language for statistics. We are all very excited about this, waiting for our first cohort of students to arrive in early October. More information about this course can be found on the “Teaching” page of this web site.

This job brings a number of opportunities for travel abroad. I joined an EU-funded group called PARSEME, which is interested in Multi-Word Expressions, phrases which can only be understood as a whole rather than by analysing the constituent words. Our first meeting in Frankfurt was held in a former Nazi chemical factory now devoted to peaceful purposes. With PARSEME, we also visited the beautiful island of Malta. In October 2013 I went to Dubrovnik in Croatia, to teach a “Fall School” on “Statistics for Linguistics”, organised by the University of Bergen.