Category Archives: Seminars 2022

Digital Humanities

Thursday 19/05/2022 10:00 – 11:30 (UK time)

Speaker: Dr Ahmed Hamdi, La Rochelle University

Title: Content Analysis of Digital Text with Special Focus on Named Entity Recognition and Linking

Abstract: Digital humanity institutions are steadily contributing an increasing amount of digital documents (either born-digital or digitised). Billions of digital documents are usually scanned and archived as images which represent a substantial resource for natural language processing (NLP) tasks. The analysis of digital documents requires therefore text extraction using optical character recognition (OCR) systems. Several studies have shown that named entities (NEs) are strongly used to index documents since they are the first point of entry in a search system for document retrieval. For this reason,  NEs can be given a higher semantic value than other words. In order to improve the quality of user searches in a system, it is thus necessary to ensure the quality of these particular terms. However, most of the digital documents are indexed through their OCRed version which includes numerous errors that may hinder access to them. In my talk, I will speak about the named entity recognition (NER) and entity linking (EL) of digital text, the impact of OCR errors on NER and EL systems performances as well as existing strategies and solutions to deal with OCR noise.

Speaker Bio: Ahmed Hamdi is a lecturer-researcher at the L3i laboratory, La Rochelle University, France. He received his PhD from the Aix-Marseille University, France. He is well known for his work in automatic language processing, information extraction and document analysis. He has published in top conferences such as SIGIR and CONLL. He has been working on different projects related to digital humanities such as NewsEye (https://www.newseye.eu/) where he used different machine learning techniques to process historic newspapers. More details are available on https://pageperso.univ-lr.fr/ahmed.hamdi/.

Digital Humanities

Wednesday 27/04/2022 11:30 – 13:00 (UK time)

Speaker: Ashrakat Elshehawy, University of Oxford

Title: The Use of NLP for Data Creation and Analysis in Political Science: Computational Text Analysis using Newspapers and Legislation Documents

Abstract: In recent decades, governments have started to maintain an online presence of their archives and documentation of their proceedings and decisions. Newspapers around the world continue to produce daily textual data. Different groups and individuals are also employing online platforms at a rapid rate, like Twitter, Facebook, and Reddit, that constantly store data about users’ activities. All of this has led to an availability of extensive text data online that social scientists can make use of to answer pressing research questions that were previously difficult to approach. In this talk, I speak about the applications of Text as Data in the field of political science. Specifically, I focus on two types of text as data, newspaper articles and US legislation. The talk discusses a recent publication that uses NLP and text analysis on over one million news articles to identify the prevalence of Russian illiberal discourse and its timing relative to German elections. The talk also underlines how NLP and computational text analysis methods are used on US legislation to build a dataset about economic sanctions that improves coverage of US sanction cases from previous datasets.

Speaker Bio: Ashrakat Elshehawy is a visiting PhD student at Yale University and a doctoral student at the Department of Politics and International Relations at the University of Oxford. Her research interests lie in the field of comparative political economy. Her research draws on questions related to the politics of public service provision and the politics of information. In her recent publications, she has focused on how foreign policy tools, such as economic sanctions, interact with domestic politics and how the NLP techniques can be used to analyse them. She has authored several journal publications related to digital humanities, including “SASCAT: Natural Language Processing Approach to the Study of Economic Sanctions” and “Illiberal Communication and Election Intervention during the Refugee Crisis in Germany”. She also taught several courses at the graduate level on Applied Statistics, Python, and Computational Text Analysis.