NLP4TM 2016: Shared task

The NLP4TM 2016 workshop proposes a shared task on cleaning translation memories. Participants in this task will be required to take pairs of source and target segments from translation memories and decide whether they are right translations. For the first task three language pairs have been prepared: EN-ES, EN-IT and EN-DE.

The data was annotated with information on whether the source and target content of each TM segment represent a valid translation. In particular, the following 3 point scale has been applied:
(1) The translation is correct.
(2) The translation is correct, but there are a few orthotypographic mistakes so some minor post-editing is required
(3) The translation is not correct (content missing/added, wrong meaning, etc.).

The data was annotated using the following guidelines: EN-ES guidelinesEN-IT guidelines and EN-DE guidelines.

The annotation guidelines are available on the task’s website.

For each language pair, 2/3 of the annotated segments are provided for training and 1/3 will be provided for testing during the evaluation phase.

1. Tasks proposed

The participating teams can choose to participate in either or both of the following three tasks:

  • Binary Classification (I)
    In this task, it is only required to determine whether a segment is right or wrong. For the first binary classification option, only tag (1) is considered correct because the translators do not need to make any modification, whilst tags (2) and (3) are considered wrong translations.
  • Binary Classification (II)
    As in the first task, in this task it is only required to determine whether the segment is right or wrong. However, in contrast to the first task, a segment is considered correct if it was labelled by annotators as (1) or (2). Segments labelled (3) are considered wrong because they require major post-editing.
  • Fine-grained Classification:
    In this task, the participating teams have to classify the segments according to the annotation provided in the training data: correct translations (1), correct translations with few orthotypographic errors (2), and wrong (3).

2. Submission and Evaluation information

Participants are required to register their intention to participate by filling in the following form before 1st April 2016: http://goo.gl/forms/ELStRtrw9J

The organisers will provide the training and test set to the participating teams and they will be asked to submit the output of their systems in a format similar to the training set. The exact modality and formatting of submissions will be communicated to participants at a later stage.

For evaluation, standard measures like precision, recall, f-measure will be used. In addition, the organisers may perform some manual error analysis. The extent of this analysis will depend on the number of systems submitted. For this reason, even though we do not plan to limit the numbers of runs submitted by participants, they will be required to indicate their primary (and secondary, if relevant) runs.

The participants are encouraged to release their systems and make them publicly available for future use. They are also encouraged not to use machine translation as one of the factors used to determine the class of a segment. This is because we are trying to encourage development of methods that can be run on large datasets without requiring a lot of computational resources.

In addition to submitting the output of their system, the participants will be asked to submit short contributions in the form of working notes describing their systems. They will be published on the workshop’s website and submissions that are not accompanied by a description will not be considered.

All systems will be presented in a demo session during the workshop.

Please see the important dates page for the various deadlines related to the shared task.