Last week, Antonio Pascucci a visiting Ph.D. Industrial Student in Computational Linguistics for Authorship and Gender Attribution in Italian social media texts from Universiy Of Naples – L’Orientale, gave a Researcher Seminar to the group.
Title: ‘Computational Stylometry for Authorship Attribution in social media texts’
Computational Stylometry (CS) is the study of stylistic features (linguistic choices). Writing style is a combination of decisions in language production. Thanks to a statistic analysis of these decisions, we can know author identity and many more characteristics about him/her. Writing style, in fact, is unique to an individual, and that’s why we talk about authorial DNA.
CS for the authorship attribution is the topic of my research project, and the aim is using CS for authorship attribution in social media texts. During the seminar, research project and my first steps in gender attribution will be shown, in addition to Cyberbullying detection researches, conducted thanks to a software made available by Expert System Corp.
Every year, the University of Wolverhampton awards 10 research fellowships to support projects led by researchers who obtained their PhD in the last 5 years. The initiative is called ERAS – Early Research Award Scheme (ERAS)  and provides a budget of up to 5,000 pounds to each project. The program has existed since 2016 and applications are selective on a competitive basis.
Marcos Zampieri, a member of RGCL and RIILP, was selected to be part of the 2018-2019 cohort of ERAS fellows with a project entitled “Identifying and Categorizing Offensive Language in Social Media”. The project deals with the application of computational methods to identify offensive and aggressive language and hate speech in social media. The funding will support the annotation of a large offensive language dataset that will be used in a SemEval 2019 task .
For more information, please check Marcos’ recent publications on the topic [3,4,5].
Pablo Calleja, from the Ontology Engineering Group at the Universidad Politécnica de Madrid, Spain, is currently completing an Internship with RGCL as part of his PhD. Yesterday, Pablo gave a talk to the group about his research.
Title: Role-based Named Entity Recognition over unstructured texts
Named Entity Recognition (NER) poses new challenges in real-world documents in which there are entities with different roles according to their purpose or meaning. Retrieving all the possible entities in scenarios in which only a subset of them based on their role is needed, produces noise on the overall precision.
The talk will present a Role-based NER task that relies on role classification hierarchy models that support recognizing entities with a specific role. The proposed task has been implemented in two use cases: one in the biomedical domain using Spanish drug Summary of Product Characteristics and the other in the legal domain using multilingual and heterogeneous mails of the Panama Papers investigation.
Last week, we were visited by Dr Chun Chang and Zhao Jie from the Institute of Scientific and Technical Information of China. Dr Chang and Zhao Jie spent the day in meetings with members of RGCL discussing future collaborations but in between meetings, Dr Chang found the time to give a talk to the group. The details can be found below-
Title: The Construction and Application of Chinese Thesaurus in China
Dr. Chun Chang, Professor, The Institute of Scientific and Technical Information of China. Dr. Chang has long been engaged in the construction and application of the knowledge organization system.
Focusing on the construction and application of the Chinese Thesaurus, the lecture will discuss three main aspects below: The definition of a thesaurus and basic information of constructing thesauri in China; The history and current situation of constructing Chinese Thesaurus and the application of Computational Linguistics in the compilation process; and the current and prospective application of Chinese Thesaurus in retrieving information.
On Monday, Antoni Oliver González, from the Universitat Oberta de Catalunya (UOC) in Barcelona, arrived at RGCL for a two week stay to form research collaborations with members of the group. On Thursday, Antoni gave the following talk to the group:
Title: Automatic detection of translation equivalents of terms in large parallel and comparable corpora
Abstract: In this talk some methodologies for finding the translation equivalents of a term in big parallel and comparable corpora will be presented. For parallel corpora we are using translation tables from Statistical Machine Translation systems (Moses). For comparable corpora we are experimenting with vecmap, a tool to create cross-lingual word embedding mappings. The experiments will be carried out using the IATE database for English for two subjects: International Relations and International organizations. The goal is to enlarge the Spanish IATE database and to create this database for Catalan.
These experiments are being performed during a short research stay and we will be only able to present preliminary results.
The Research Group in Computational Linguistics (RGCL) has been successful in their application for a European Masters in Technology for Translation and Interpreting (EM TTI).
EM TTI will be run by the strong consortium consisting of the University of Wolverhampton, University of Malaga (Spain), University of Ljubljana (Slovenia) and New Bulgarian University (Bulgaria) and will deliver a cohesive, integrated European-wide programme. Bringing together these four Higher Education institutions, who are leading researchers in computational aspects of language study, as well as in state-of-the-art technology for translation and interpreting, will give the students access to high-profile academics and best practices across the field. Students on the two-year degree course have the opportunity to study at multiple universities and undertake industry placements related to their dissertation.
EM TTI will produce specialists in translation and interpreting who are up-to-date with the latest applications which support their daily work. The disciplines involved are translation, interpreting, language technology, and linguistics.
This was a highly competitive application process. Prof. R Mitkov, the coordinator of the programme and Director of the Research Institute commented ‘This programme is not only the first Erasmus Mundus Master programme on Technology for Translation and Interpreting but the very first Master programme in the world on this topic. It will not only enhance the visibility of the research group and university, but will also create a very special teaching and research vibrant environment on the topics covered. ‘
The funding of 3 million Euros granted by the EC will cover 60 scholarships across the consortium. The offer of scholarships will drive competition for places and ensure candidates of the highest calibre are selected. Students will be awarded a Multiple Master’s degree from the institutions where they study.
The new programme will begin in September 2019, with applications opening in November/December 2018. For any further information, please contact Amanda Bloore, Project and Funding Officer for RIILP (A.Bloore@wlv.ac.uk).