RCGL Seminars logo

Digital Humanities

Dr Ahmed Omer, XTM International

5 July 2021

Title: Computational Stylometry of Arabic Literature


The successful implementation of stylometric methods with English texts has motivated researchers who work with the Arabic texts to investigate whether they can use these methods in the Arabic language as well. Taking into account the different characteristics of the Arabic language, the main aim of my study is to investigate what are the most useful linguistic features to enable the authorship attribution task to be accomplished for Arabic texts. As well as using features derived from English studies of author attribution, I developed a number of feature sets derived from Arabic linguistic theory, namely Arud, Nazm and Wazn. The feature sets were compared on two corpora of travelogues, one in English and one in Arabic. The feature sets were examined in conjunction with agglomerative clustering methods and traditional machine learning classifiers including SVM, Naïve Bayes, and KNN, as well as a Deep Learning model implemented using the open source package Keras. The findings from this first part of the thesis were used to examine six real-life case studies from Arabic, two of Authorship Attribution, two on Author Profiling, and two on Authorship Verification. These case studies respectively were:

· Was Al-Qarni’s “Don’t Despair” plagiarised from Salwa?

· Did Abdu or Amin write certain key chapters of “Women’s Rights”?

· Were the “Hanging Poems” pre-Islamic or more recent?

· A study of the dialectology of Arabic speech.

· Was a box of posthumous texts by the Nobel prize winner Naguib Mahfouz indeed by him?

· Were some texts written by the Mediaeval scholar Al-Ghazali by him or by somebody else?


Ahmed Omer has an M.Sc. in Computer Science from Napier University in Edinburgh and a Ph.D. in Computational Linguistics from the University of Wolverhampton. He is now working at XTM International as a Computational Linguistics Expert. The company is working in Machine Translation and they use the Inter-language vector space method. This interesting method has been used by Google and recently by Facebook to enforce their polices and to translate texts for customers in their platform.