Speaker: Rohit Gupta (University of Wolverhampton)
Date: 3 June 2015
Abstract: Current Translation Memory (TM) systems work at the surface level and lack semantic knowledge while matching. Most of the TMs use simple edit-distance calculated on the surface form or some variation of it (stem, lemma), which does not take into consideration any semantic aspects in matching. In this talk, I will present a novel and efficient approach to incorporating semantic information in the form of paraphrasing with edit-distance. The approach is based on greedy approximation and dynamic programming. We have obtained significant improvements in recall as well as precision i.e. retrieving more segments with better quality. We have also carried out extensive human evaluation. We have measured post-editing time, keystrokes, two subjective evaluations, HTER, HMETEOR, BLEU and METEOR to substantiate our research. Our results show that paraphrasing improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase enhanced TMs. Furthermore, I will present our work on using semantic similarity for TM matching and retrieval. I will also present some ongoing work using deep learning techniques.