Shared task: frequently asked questions

Question: How many system submissions can we submit per team?
Answer: You can submit results for each task we have defined: Binary-Classifications (I and II) and Fine-Grained Classification. Although you can submit several systems for each task, it would be better if you only submit the best system for each task (i.e. 3 systems in total if you participate in all tasks).

Question: How big is the test data? Are the test and training from the same source / same domain?
Answer: The test set size is half of the training set. We have applied Stratified 3 folds to prepare the training and test sets. Thus, 2/3 of the whole data set are distributed as training data and the remaining 1/3 has been kept for testing.
The whole data set has been extracted from a huge translation memory database and it comprises many domains ranging from medicine to colloquial conversations.

Question: Is the proportion of negative examples from the test data similar to the one of the training data? Will there be a skew towards one of the labels?
Answer. The proportion of the labels is the same in the training and test sets. In preparing the sets we have used Stratified 3 folds.

Question: Shall I subscribe to some google group or mailing list to receive notifications and share questions/queries with the organizers and other participants to minimize the e-mail flow?
There currently is no google group or mailing list. We may set up one at a later stage if the e-mail flow is too big. For the moment, you can send any queries to and one of us will reply to you as soon as we can.

Question: Will our system submission notes be added to the workshop proceedings?
Answer: Unfortunately, due to the LREC deadlines for submitting the final workshop proceedings the submission notes will not be added to the workshop proceedings but will be made publicly available through the shared tasks’ website.

Question: Can you confirm that the final classification system has to be completely automatic, with no human interaction involved?
Answer: Yes. The classification should be fully automatic with no manual intervention.

Question: Should we exclusively use the official training data or are we allowed to merge it with additional training data?
Answer: For a fair comparison of the different participating systems you should only use the training data we are providing and submit the results obtained with no additional training data.

However, you can additionally submit a different system using additional training data as long as you have submitted a run without it and you clearly specify that this second run includes additional data. In your notes, you should also describe the type of data you additionally added.

Question: Can we use auxiliary tools such as POS taggers? What about monolingual language models or bilingual dictionaries? (On the website it is written that machine translation is discouraged)
Answer: You can use any external resources you want: POS taggers, dictionaries, Wikipedia, aligners etc. When you submit your system description notes, you should clearly state which tools you used. Machine Translation is discouraged because we are trying to encourage development of methods that can be run on large datasets without requiring a lot of computational resources. However, if you are interested in seeing whether that is the best option, you are of course free to try and submit a system using it.

Question: Is there a baseline system that the systems could be compared upon?
Answer: We have implemented two baselines you can compare the system with:

  • The first baseline assumes that the training and test set categories have the same distribution. The categories in the test set are randomly generated but they respect the training set distribution.
  • The second baseline corrects the first baseline for the Church-Gale scores higher than the threshold (specified at 2.5 in the script). For these scores the category label is set to “3” category. You can download the python script that implements the baselines..