RGCL is delighted to share that a team of our academics and PhD student have recently been awarded the Best Paper Award for Qur’an QA shared task 2022. This is at the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5) at the 13th Language Resources and Evaluation Conference (LREC 2022). This is the first best paper award for the recently established RIGHT Lab.
The organisers evaluated the papers based on different metrics:
Title: DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain
Abstract:
The task of machine reading comprehension (MRC) is a useful benchmark to evaluate the natural language understanding of machines. It has gained popularity in the natural language processing (NLP) field mainly due to the large number of datasets released for many languages. However, the research in MRC has been understudied in several domains, including religious texts. The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur’an. This paper describes the DTW entry to the Quran QA 2022 shared task. Our methodology uses transfer learning to take advantage of available Arabic MRC data. We further improve the results using various ensemble learning strategies. Our approach provided a partial Reciprocal Rank (pRR) score of 0.49 on the test set, proving its strong performance on the task.
Search Solutions is an annual event run by the Information Retrieval Specialist Group, the section of the British Computer Society which has a special interest in search engines. This year it took place on Wednesday 24th November, and was held online for invited speakers from industry to talk about their work in information retrieval. The British Computer Society has new offices at 25 Copthall Avenue in London, near the Bank of England. On the day before, a series of tuorials designed to introduce people to related topics were held, such as one given by Ingo Frommholz from our own computing department on search engine evaluation.
The tutorial on Natural Language Processing was given by myself. Unlike the others, it was an all-day event, and held face-to-face. After having had experience of online teaching during the pandemic, I know that I prefer the closer interaction with the students which comes with face-to-face teaching.
The contents of the tutorial were almost the same as the first three weeks of lectures that I give on the MA Computational Linguistics module in RIILP. I used the structure of the textbook by Jurafsky and Martin as a skeleton, but brought in other things such as the practical exercises from the Edinburgh Textbooks in Empirical Linguistics on stemming and automatic part of speech tagging. Stemming covers techniques for regarding different grammatical forms of a word as being related to each other, and part-of-speech tagging is assigning a part-of-speech category (such as noun or verb) to each word in the input sentence. I used the first edition of Jurafsky and Martin to open the discussion with a short dialogue between Dave the astronaut and HAL the computer from the film “2001 – A Space Odyssey”. What natural language techniques would HAL need to know to carry out this conversation?
At the event, I was pleased to see some old friends in the audience, including Ingo in the morning, before his own workshop began.
A number of staff and students recently attended RANLP 2021 – online this year due to the ongoing pandemic – however, as you can see from the reports below, still a lively and engaging conference.
Another successful online conference I attended is RANLP. I enjoyed the ability to attend many sessions and workshops from the comfort of my home office😊 RANLP had very interesting keynote speeches, they were quite informative on the ongoing research trends for different NLP groups all over the world. My online presentation went very well, the only thing that I was missing was to see the attendees reactions while talking. I can either concentrate on my slides or the participants 😊 But I was happy from how the research ideas were interesting to many. The RANLP workshops were also excellent. Researchers from top-notch universities gave very interesting presentations. Looking forward to repeating this wonderful experience in the future.
Hadeel Saadany – PhD Student
I recently participated in RANLP 2021 (Recent Advances in Natural Language Processing). RANLP has established itself over the years as one of the most influential and competitive NLP conferences. This year, due to the COVID situation in many countries, organisers decided to keep the conference virtually using the zoom technology.
RANLP 2021 had excellent keynote speeches from top researchers in NLP around the world. The RANLP organisers made sure that there were at least three keynote speeches for a day. Usually, the day started with a keynote speech. There was another keynote speech after the lunch break and the day concluded with a keynote speech. Day 1 in RANLP 2021 began with a keynote speech from Dr Jing Jiang in Singapore Management University. She talked about the latest research on question and answering. In the afternoon, we had a keynote speech from Prof Josef van Genabith and Nico Herbig on translation technologies. They talked about implementing a multimodal user interface for post-editing. Day 2 in RANLP started with a keynote speech from Prof Hwee Tou Ng where he talked about current and future research directions in grammatical error correction in texts. After the lunch break, we had a keynote speech from Prof Constantin Orasan. He provided a very informative session on preserving sentiment in machine translations. Since he is the first supervisor in my PhD studies, he also talked about the research we did on translation quality estimation for my PhD. Therefore, this session was special for me. Later, in the afternoon we had a keynote from Dr He He at New York University about text generation. She talked about the latest developments in the text generation area, including neural transformers. The final day in RANLP started with a keynote speech from Prof Tim Baldwin about text summarisation and the evaluation of text summarisation methods. As the second keynote speech of the day, we had a session by Prof Sebastian Riedel where he talked about learning from knowledge bases and reasoning in machine learning models. As the final keynote, we had a session with Prof Alessandro Moschitti. He presented a very informative session on recent developments of question and answering. Overall, all of the keynote speeches in RANLP were enlightening and provided useful insight knowledge about the state-of-the-art in several NLP topics. It was great to listen to the pioneers of the field and hear their first-hand experiences.
In RANLP 2021, I was fortunate to be a session chair of two parallel sessions. My first parallel session was on the 1st of September, which contained four long papers about offensive language identification. There were four exciting papers, including offensive language identification in Spanish and Romanian, in that session. The second session I chaired was on the 2nd of September. It contained four fascinating papers on translation technologies. RANLP was my first experience being a session chair, and it was a good opportunity for me. I thank the RANLP organisers for allowing me to be a session chair.
I presented two papers at the conference. I got the opportunity to present my first paper on the 1st of September. It contained the work we did on creating an offensive language identification corpus on a low-resource language, Marathi. I presented the second paper on the 3rd of September, which was on multilingual misinformation identification on COVID-19 tweets, which is timely research. I received very good feedback from the audience for both of the papers with comments to improve in future work. I hope to incorporate them in my future work, and I am glad for the RANLP participants for their valuable ideas.
During the conference, I got the opportunity to get to know several researchers working in the same field from universities worldwide, and the networking was very valuable. However, I did miss the physical presence and all the fun activities in RANLP, such as cocktail receptions and the Gala dinner. I hope that we can have the next RANLP conference physically in Varna, Bulgaria and present at the venue site. Finally, I would like to thank the organisers of RANLP for having the conference despite the difficult situation in the world.
Researchers’ Week aims to provide postgraduate researchers with the opportunity to develop their research skills and knowledge development, as well as their networks with other researchers and their community of practice.
This year the theme is ‘Vision 2030 – Developing our Research’, andResearchers’ week & ARC will once again take place online as we make the most out of our newfound skills of connecting virtually across the world.
Staff and students from any discipline and at any stage of research where invited to give a presentation that considers the theme, with a focus on Equality, Diversity & Inclusion, making Impact and addressing SocietalChallenges within their own research or broader research area.
At the end of the week, all staff and students gathered for a virual award cermony to celebrate the achievements of our researchers and staff.
There were two nominations from RIILP. Professor Mitkov nomiated Tharindu Ranasinghe, one our PhD candiates and Suman nominated her colleagues Amanda, Kate and April on the Admin Team.
RGCL are pleased to announce that one of our PhD students – Hadeel Saadany – has had two papers accepted at COLING 2020 workshops. Congratulations! We look forward to the conference in December.
First paper:
‘Is it Great or Terrible?
Preserving Sentiment in Neural Machine Translation of Arabic Reviews’
Accepted in: The Fifth Arabic Natural Language Processing Workshop – COLING’2020, Barcelona, Spain, 12 Dec. 2020
Team: Hadeel Saadany and Constantin Orasan
Summary:
The paper investigates the challenges involved in translating User Generated Content from Arabic into English with particular focus on the errors that lead to incorrect translation of sentiment polarity. It shows that fine-tune an NMT model with respect to sentiment polarity can significantly help in correcting sentiment errors detected in the online translation of Arabic UGC.
Second paper:
‘Fake or Real? A Study of Arabic Satirical Fake News
Accepted in: 3rd International Workshop on Rumours and Deception in Social Media – COLING’2020, Barcelona, Spain, 12 Dec. 2020
Team: Hadeel Saadany, Emad Mohamed, and Constantin Orasan.
Summary:
This paper conducts several
exploratory analyses to identify the linguistic properties of Arabic fake news
with satirical content. It shows that although it parodies real news, Arabic
satirical news has distinguishing features on the lexico-grammatical level. It
also builds a number of machine learning models capable of capturing satirical
fake news with an accuracy of up to 98.6\%.
I recently participated in the LxMLS summer school in Lisbon, Portugal. This is an annual event that focuses on theory and application of machine learning with a focus on natural language processing. The lectures followed a linear progression, starting from the fundamentals of traditional machine learning and later covered developments in deep learning. Each day in the morning, there was a lecture on some aspect of machine learning and then after the lunch students were assembled into groups to participate in the practical programming sessions. In the afternoons there was a talk on some application of machine learning in an actual research project.
In total there were more than 230 participants and the summer school lasted for 8 days. The lecturers are accomplished researchers in the field and the presentations were usually engaging and informative. I particularly enjoyed the talks given by Noah Smith, Chris Dyer, and Kyunghyun Cho. The event also included a poster presentation and a demo day where regional IT companies showcased their work and did recruitment advertising.
During the summer school I got the opportunity to get to know several PhD students working in the field from universities around the world and the networking was very valuable. The practical coding sessions could have been organised better with more supervision but overall I consider the experience as positive and worthwhile. I also found a bit of time during the day off to explore Lisbon and its surrounding areas. I enjoyed the historical delights and the amazing seafood and look forward to revisiting Portugal again soon.