For the last Staff Research Seminar on 2017, Dr Michael Oakes gave a talk on his current research. The paper was well received and there was an interesting debate and questions afterwards.
TITLE: Experiments on “The Dark Tower”, the Indus Script and the ENNTT Corpus.
In this talk I will give a brief introduction to the research I have been doing this year (and earlier).
Firstly, I will talk about the use of disputed authorship techniques, especially principal components analysis, to look at the probable authorship of the “Dark Tower”, which is generally attributed to the author of the “Narnia” series, C. S. Lewis.
Secondly I will look at the use of LNRE (Large Numbers of Random Events) models to estimate the vocabulary size of the undeciphered Indus script, which was used in Northern India and Pakistan from approximately 2600 to 1900 BC.
Thirdly, the ENNTT (Europarl Corpus of Native, Non-native and Translated Texts) corpus, developed by Rabinovitch et al., is a subset of the Europarl corpus. Using the ENNTT sub-corpus of texts translated into English, principal components analysis can be used to determine the language family (Romance or Germanic) that the texts were originally written in, and to a lesser extent, even the individual language.