RGCL welcomed visitors Matthias Schlögl and Katalin Eszter Lejtovicz from the Austrian Centre for Digital Humanities earlier this week. Whilst they were here Matthias and Katalin presented their current research to the group, the seminar was well received and there was an interesting discussion.
The Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Sciences is a research institute which was set up with the declared intention of fostering the humanities by applying digital methods and tools in a wide range of academic fields. It offers a growing portfolio of services running a repository for digital language resources, hosting and publishing data, developing software and working on establishing a tightly knit network of specialised knowledge centres by offering advice and guidance to the research community.
In his presentation Matthias Schlögl will concentrate on the APIS project. APIS ultimate goal is the semantic enrichment of the roughly 18.000 biographies published so far in the Austrian Biographic Dictionary (ÖBL). In the course of the project a Virtual Research Environment (VRE) was developed that allows researchers to annotate biographies, link entities to reference resources and visualize/export the results. Data generated in the VRE is used to train/evaluate Natural Language Processing (NLP) tools that also store annotations to the VRE (where in return they can be reviewed by researchers). Matthias will also show some NLP related tools (e.g. a webbased tool to re-train named entity recognition models (NER)) that are in use/development at the ACDH to foster digital humanities projects like APIS.
In her presentation Katalin Eszter Lejtovicz will present the APIS project aims to extract information from unstructured biographical documents by means of detecting Named Entities, linking them to Linked Open Data vocabularies and finding relations between the entities. The presentation will give an introduction to the steps of information extraction in APIS, give a brief overview of the tools/resources that are used in the project and the ones we have been experimenting with. To name a few: Apache Stanbol for Entity Linking, IEPY, GATE for relation extraction, GermaNet, Wikidata for disambiguation.
Last week we enjoyed a visit from Dr. Shiyan Ou from the School of Information Management, Nanjing University, China. The group enjoyed her visit and her seminar was very well received.
Title: Unsupervised Citation Sentence Identification based on Similarity Measurement
Abstract: Citation Context Analysis has obtained the interest of many researchers in the field of bibliometrics. To do this, the first step is to extract the context of each citation from a citing paper. We proposed a novel unsupervised approach for the identification of implicit citation sentences without attaching a citation tag. Our approach selects the neighbouring sentences around an explicit citation sentence as candidate sentences, calculates the similarity between a candidate sentence and a cited or citing paper, and deems those that are more similar to the cited paper to be implicit citation sentences. To calculate text similarity, we proposed four methods based on the Doc2vec model, the Vector Space Model (VSM) and the LDA model respectively. The experiment results showed that the hybrid method combing the probabilistic TF-IDF weighted VSM with the TF-IDF weighted Doc2vec obtained the best performance. Compared against other supervised methods, our approach does not need any annotated training corpus, and thus can be easy to apply to other domains in theory.
Our next visitor will be Professor Gloria Corpas Pastor who will be giving lectures on the 9th and 10th April.
On Monday we welcomed Reshmi Gopalakrishna Pillai who gave a joint talk to RGCL and SCRG on her PhD. This was a jam packed seminar with standing room only available!
Detection of strength and causal agents of stress and relaxation in social media content
The accuracy of a person’s psychological stress and relaxation magnitude measurement from his social media content can be improved by incorporating word sense disambiguation. The causal agents of stress and relaxation can be inferred from the social media output.
- Can the accuracy of psychological stress or relaxation strength detection from social media content be improved using word sense disambiguation?
- Can the Aspect Based Sentiment Analysis methods to detect keywords in tweets be used for identifying causal agents of stress and relaxation from social media content?
Our next seminar is this Friday, 23rd March, and will be given by Dr. Shiyan Ou visiting from Nanjing Univeristy, China. For more details please contact firstname.lastname@example.org
For our latest Research Seminar on the 7th March we welcomed Hanna Bechara, PhD student, over from Dublin. Her talk on her thesis was well attended and prompted interesting questions from the group.
TITLE: Semantic Textual Similarity and its Application in Evaluation
Semantic Textual Similarity measures the degree of semantic equivalence between two sentences or phrases. Similarity measures between sentences are required in a wide variety of NLP applications, such as information retrieval, automatic text summarisation. Our work investigates the applications of semantic textual similarity in evaluation. We focus specifically on the evaluation of machine translation and automatic text simplification. By using methods previously employed in Semantic Textual Similarity (STS) tasks, we use semantically similar sentences and their quality scores as features to estimate the quality of machine translated sentences. Our results show that this method can improve the prediction of machine translation quality for semantically similar sentences. We apply the semantic similarity methods to evaluate the output of automatically simplified text. We find that our features are strong indicators for quality. On the Shared Task on Quality Assessment for Text Simplification (QATS), our classification systems ranked second overall among all participating systems and consistently outperformed the baseline for all types of quality measures.
Our next seminar will be given by Reshmi Gopalakrishna Pillai on Detection of strength and casual agents of stress and relaxation in social media content. This will be from13:00 on the 19th March in MU402, all are welcome and we look forward to seeing you there.
At the end of February, RGCL welcomed Sheila Castilho from Dublin City University. During her visit she gave a lecture comparing PBSMT and NMT systems. The lecture was well received and also attended by the Research Group’s MA students.
TITLE: A multifaceted comparison between PBSMT and NMT systems Continue reading
Dr. Aline Villavicencio from the University of Essex (UK) and Federal university of Rio Grande do Sul (Brazil) is visiting RGCL in April. She will be giving a talk on Identifying Idiomatic Language with Distributional Semantic Models on the 19th April 2018, abstract below. If you are interested in attending the talk please contact A.Harper2@wlv.ac.uk for more details.
Identifying Idiomatic Language with Distributional Semantic Models
Precise natural language understanding requires adequate treatments both of single words and of larger units. However, expressions like compound nouns may display idiomaticity, and while a police car is a car used by the police, a loan shark is not a fish that can be borrowed. Therefore it is important to identify which expressions are idiomatic, and which are not, as the latter can be interpreted from a combination of the meanings of their component words while the former cannot. In this talk I discuss the ability of distributional semantic models (DSMs) to capture idiomaticity in compounds, by means of a large-scale multilingual evaluation of DSMs in French and English. A total of 816 DSMs were constructed in 2,856 evaluations. The results obtained show a high correlation with human judgments about compound idiomaticity (Spearman’s ρ=.82 in one dataset), indicating that these models are able to successfully detect idiomaticity.