Miguel Rios’ PhD thesis: Methods for Measuring Semantic Similarity of Texts

Miguel Rios (2014) Methods for Measuring Semantic Similarity of Texts. PhD Thesis, University of Wolverhampton, UK

Abstract

Measuring semantic similarity is a task needed in many Natural Language Processing (NLP) applications. For example, in Machine Translation evaluation, semantic similarity is used to assess the quality of the machine translation output by measuring the degree of equivalence between a reference translation and the machine translation output. The problem of semantic
similarity (Corley and Mihalcea, 2005) is defined as measuring and recognising semantic relations between two texts. Semantic similarity covers different types of semantic relations, mainly bidirectional and directional. This thesis proposes new methods to address the limitations of existing work on both types of semantic relations.

Recognising Textual Entailment (RTE) is a directional relation where a text T entails the hypothesis H (entailment pair) if the meaning of H can be inferred from the meaning of T (Dagan and Glickman, 2005; Dagan et al., 2013). Most of the RTE methods rely on machine learning algorithms. de Marneffe et al. (2006) propose a multi-stage architecture where a first
stage determines an alignment between the T-H pairs to be followed by an entailment decision stage. A limitation of such approaches is that instead of recognising a non-entailment, an alignment that fits an optimisation criterion will be returned, but the alignment by itself is a poor predictor for non-entailment. We propose an RTE method following a multi-stage architecture, where both stages are based on semantic representations. Furthermore, instead of using simple similarity metrics to predict the entailment decision, we use a Markov Logic Network (MLN). The MLN is based on rich relational features extracted from the output of the predicate-argument alignment structures between T-H pairs. This MLN learns to reward pairs with similar predicates and similar arguments, and penalise pairs otherwise. The
proposed methods show promising results. A source of errors was found to
be the alignment step, which has low coverage. However, we show that when
an alignment is found, the relational features improve the final entailment
decision.

The task of Semantic Textual Similarity (STS) (Agirre et al., 2012) is defined as measuring the degree of bidirectional semantic equivalence between a pair of texts. The STS evaluation campaigns use datasets that consist of pairs of texts from NLP tasks such as Paraphrasing and Machine Translation evaluation. Methods for STS are commonly based on computing similarity
metrics between the pair of sentences, where the similarity scores are used as features to train regression algorithms. Existing methods for STS achieve high performances over certain tasks, but poor results over others, particularly on unknown (surprise) tasks. Our solution to alleviate this unbalanced performances is to model STS in the context of Multi-task Learning using Gaussian Processes (MTL-GP) ( Alvarez et al., 2012) and state-of-the-art STS features (Saric et al., 2012). We show that the MTL-GP outperforms previous work on the same datasets.

BibTeX

@PhdThesis{rios-phd,
   author =   {Miguel Angel Rios Gaona},
   title =    {Methods for Measuring Semantic Similarity of Texts},
   year =     {2014},
   address =  {Wolverhampton, UK},
   URL =      {http://rgcl.wlv.ac.uk/wp-content/uploads/2015/04/thesis-Miguel_Angel_Rios_Gaona.pdf}
}

One thought on “Miguel Rios’ PhD thesis: Methods for Measuring Semantic Similarity of Texts

  1. Pingback: Miguel Rios’ PhD thesis available online | Research Group in Computational Linguistics

Comments are closed.