Natural Language Processing and Information Retrieval Workshop
RANLP 2017
September 7, 2017 - Varna Bulgaria

Workshop Program

You can find the workshop program here (pdf)

19 Dec 2017: The proceedings of all the RANLP 2017 workshops are available at and will be uploaded and indexed in the ACL Anthology soon

Invited Speakers


Fact Checking in the Post Truth Era


Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually-true claims. In this talk, we will present several related problems. First, in the context of investigative journalism, we will address the problem of automatically identifying which claims in a political debate are most worthy and should be prioritized for fact checking. We will then present a general-purpose deep learning framework for fully-automatic fact checking using external sources, which taps the potential of the entire Web as a knowledge source to confirm or reject a claim. We will further extend this framework to the context of community question answering, where the goal is to decide whether an answer to a factual question is factually true or false. Finally, we will describe an application of this framework to the problem of fake news and clickbait detection, presenting the architecture of a system that won a recent hackaton on this problem.

About the speakers:

Dr. Preslav Nakov

is a Senior Scientist at the Qatar Computing Research Institute, HBKU. His research interests include computational linguistics and natural language processing (for English, Arabic and other languages), machine translation, question answering, fact-checking, sentiment analysis, lexical semantics, Web as a corpus, and biomedical text processing. Preslav Nakov co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals. He received the Young Researcher Award at RANLP'2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Preslav Nakov is Secretary of ACL SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics. He is also a Member of the Editorial Board of the Journal of Natural Language Engineering, an Associate Editor of the AI Communications journal, and an Editorial Board member of the Language Science Press Book Series on Phraseology and Multiword Expressions. He served on the program committees of the major conferences and workshops in computational linguistics, including as a co-organizer and as an area/publication/tutorial/shared task chair, Senior PC member, student faculty advisor, etc.; he co-chaired SemEval 2014-2016 and was an area co-chair of ACL, EMNLP, NAACL-HLT, and *SEM, a Senior PC member of IJCAI, and a shared task co-chair of IJCNLP’2017. Preslav Nakov received a PhD degree in Computer Science from the University of California at Berkeley (supported by a Fulbright grant and a UC Berkeley fellowship), and a MSc degree from the Sofia University. He was a Research Fellow at the National University of Singapore, a honorary lecturer in the Sofia University, and a research staff in the Bulgarian Academy of Sciences.

Georgi Karadzhov

is working on projects focused on natural language processing, style masking/author obfuscation, question answering and fact-checking. He explored verifying claims using automatically extracted data from the Internet and using it as an external source of information for general-purpose fact-checking. Additionally, Georgi was part of teams awarded first place in multiple challenges, including 'Hack the fake news' Hackathon in 2017, and Author Obfuscation task at PAN lab, hosted by CLEF in 2016. Georgi has just finished his MSc in Information Retrieval from Sofia University, where he is also a teaching assistant for Artificial Intelligence, Information Retrieval and Natural Language Processing courses for both Bachelour's and Master's level classes. Currently, he is working at SiteGround Hosting company, building real-life NLP applications used by thousands of users every day.

Tsvetomila Mihaylova

has a Master's degree in Information Retrieval and Knowledge Discovery from Sofia University "St. Kliment Ohridski". She is also a Senior Software Engineer. She is interested in Natural Language Processing and her research interests include question answering and fact-checking.

Pepa Gencheva

is a fresh M.S. graduate in Artificial Intelligence at the Faculty of Mathematics and Informatics at Sofia University. She is a teaching assistant in the Sofia University where she leads laboratory exercises in Natural Language Processing and Information Retrieval for Master's students and in Artificial Intelligence for Bachelor's students. Her research interests lie in automating separate stages of the fact-checking process, including finding check-worthy statements and estimating credibility of claims. In the past, she also focused on research areas, including question answering and author profiling. She is also a winner of several challenges, including SemEval 2016 Task 3 on Community Question Answering and a Hackathon on Fake News 2017 in Bulgarian. Since October 2016 she is also working at the hosting company SiteGround, bringing smart applications of NLP to life.


MappSent: a Textual Mapping Approach for Question-to-Question Similarity


Since the advent of word embedding methods, the representation of longer pieces of texts such as sentences and paragraphs is gaining more and more interest, especially for textual similarity tasks. Inspired by Mikolov (2013) and Arora (2017) findings and by a bilingual word mapping technique presented in (artetxe, 2016), we introduce MappSent, a novel approach for textual similarity. Based on a linear sentence embedding representation, its principle is to build a matrix that maps sentences in a joint-subspace where similar sets of sentences are pushed closer. We evaluate our approach on the SemEval 2016/2017 question-to-question similarity task and show that overall MappSent achieves competitive results and outperforms in most cases state-of-art methods.

About the speaker:

Dr. Amir Hazem

is a post-doc in the department of computer science (LS2N laboratory) at the university of Nantes (France) working with associate professor Nicolas Hernandez on discourse analysis and multi-modal textual similarity. Previously, i completed my PhD entitled: "Bilingual terminology extraction from specialized comparable corpora" in 2013 at university of Nantes under the supervision of Prof. Emmanuel Morin. I also completed a post-doc in 2014 at the university of Maine (LIUM laboratory France) on machine translation working with Prof. Holger Schwenk, Assistant professors Loic Barrault and Fethi Bougares. My research interests lie in the Natural Language Processing (NLP) field ranging from discourse analysis and multi-modal textual similarity to machine translation and bilingual terminology extraction as well as complex synonym extraction.


N-gram graphs and Entity graphs: from sub-word to entity neighborhoods


The talk outlines the representation of Entity Graphs, aimed to represent events and usable in applications such as Summarization, Clustering and Information Retrieval. We communicate the motivation behind the new representation, the evolution of the basic idea of n-gram graphs into entity graphs and the findings over a set of single- and multi-lingual event detection (clustering) experiments. We conclude by possible avenues of research to connect the Entity Graphs representation to IR tasks.

About the speaker:

Leonidas Tsekouras

is an associate researcher at the Software and Knowledge Engineering Lab at the National Centre for Scientific Research "Demokritos". He studied Informatics and Telematics at the Harokopio University, where his undergraduate thesis was about text comparison using n-gram word graphs and information extraction techniques.


What are people searching for?


With the development of data accessibility in recent decades, Information Retrieval has become a vital necessity for people's everyday life. In the current talk I want to discuss how IR has been changing over years; its past and current tasks and approaches. An emphasized focus will be made on sentiment analysis as one of the main subjects in both NLP and IR. The talk will conclude with some speculations about the possible future ways of IR development.

About the speaker:

Dr. Victoria Bobicev

Victoria Bobicev is teaching Computational Linguistics and Natural Language processing at Technical University of Moldova. She teaches courses "Semantic Interpretation of Text", "Knowledge Based Systems", "Statistical Text Analysis" for undergraduate and master students at “Informatics and System Engineering" department, Computers, Informatics and Microelectronics faculty at Technical University of Moldova. She has received her PhD at the same university with a thesis entitled "Statistical Methods and Algorithms of Text Processing (based on Romanian texts)". The developed in the thesis method of text classification was successfully applied for author recognition, similar languages discrimination and native language identification. In 2012, she participated in the project "Learning Personal Health Information in user-written Web content" at Eastern Ontario Research Institute where she investigated the author identification issues and sentiment analysis in user generated content. Her current interest is sentiment analysis. Her published works cover sentiment analysis of health-related forums, tweets and news.