RGCL welcomed visitors Matthias Schlögl and Katalin Eszter Lejtovicz from the Austrian Centre for Digital Humanities earlier this week. Whilst they were here Matthias and Katalin presented their current research to the group, the seminar was well received and there was an interesting discussion.
The Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Sciences is a research institute which was set up with the declared intention of fostering the humanities by applying digital methods and tools in a wide range of academic fields. It offers a growing portfolio of services running a repository for digital language resources, hosting and publishing data, developing software and working on establishing a tightly knit network of specialised knowledge centres by offering advice and guidance to the research community.
In his presentation Matthias Schlögl will concentrate on the APIS project. APIS ultimate goal is the semantic enrichment of the roughly 18.000 biographies published so far in the Austrian Biographic Dictionary (ÖBL). In the course of the project a Virtual Research Environment (VRE) was developed that allows researchers to annotate biographies, link entities to reference resources and visualize/export the results. Data generated in the VRE is used to train/evaluate Natural Language Processing (NLP) tools that also store annotations to the VRE (where in return they can be reviewed by researchers). Matthias will also show some NLP related tools (e.g. a webbased tool to re-train named entity recognition models (NER)) that are in use/development at the ACDH to foster digital humanities projects like APIS.
In her presentation Katalin Eszter Lejtovicz will present the APIS project aims to extract information from unstructured biographical documents by means of detecting Named Entities, linking them to Linked Open Data vocabularies and finding relations between the entities. The presentation will give an introduction to the steps of information extraction in APIS, give a brief overview of the tools/resources that are used in the project and the ones we have been experimenting with. To name a few: Apache Stanbol for Entity Linking, IEPY, GATE for relation extraction, GermaNet, Wikidata for disambiguation.