Translation Memories (TM) are amongst the most used tools by professional translators, if not the most used. The underlying idea of TMs is that a translator should benefit as much as possible from previous translations by being able to retrieve how a similar sentence was translated before. Moreover, the usage of TMs aims at guaranteeing that new translations follow the client’s specified style and terminology. Despite the fact that the core idea of these systems relies on comparing segments (typically of sentence length) from the document to be translated with segments from previous translations, most of the existing TM systems hardly use any language processing for this. Instead of addressing this issue, most of the work on translation memories focused on improving the user experience by allowing processing of a variety of document formats, intuitive user interfaces, etc.
The term second generation translation memories has been around for more than ten years and it promises translation memory software that integrates linguistic processing in order to improve the translation process. This linguistic processing can involve matching of subsentential chunks, edit distance operations between syntactic trees, incorporation of semantic and discourse information in the matching process. This workshop invites papers presenting second generation translation memories and related initiatives.
Terminologies, glossaries and ontologies are also very useful for translation memories, by facilitating the task of the translator and ensuring a consistent translation. The field of Natural Language Processing (NLP) has proposed numerous methods for terminology extraction and ontology extraction. Researchers are encouraged to submit papers to the workshop which show how these methods are being successfully applied to Translation Memories. In addition, papers discussing the integration of Machine Translation and Translation Memories or studies about automatic building of translation memories from corpora are also welcomed.
This workshop invites original papers which show how language processing can help translation memories. Topics of interest include but are not limited to:
- improving matching and retrieval of segments by using morphological, syntactic, semantic and discourse information
- automatic extraction of terminologies and ontologies for translation memories
- integration of named entity recognition and terminologies in matching and retrieval
- using natural language processing for automatic construction of translation memories
- extracting and aligning TM segments from a parallel or comparable corpus
- construction of translation memories using the Internet
- corpus based studies about the usefulness of TM for specific domains
- development of hybrid TM and MT translation systems
- study of NLP techniques used by TM tools available in the market
- automatic methods for TM cleaning and maintenance
The first edition of this workshop organised at RANLP 2015 confirmed the fact that there is interest in the research community for the topics proposed. In addition, it highlighted the need for automatic methods for cleaning translation memories. For this reason, the second edition of the NLP4TM workshop will also organise a shared task on cleaning translation memories in an attempt to make the creation of resources for translation memories easier.
The workshop will also organise a round table which will give the opportunity to participants to discuss their ideas about the future of the field. A well known researcher in the field will be invited to give a keynote speech and to chair this round table.
The workshop will be partially supported by the FP7 ITN project EXPERT (http://expert-itn.eu).