Wilker Aziz’s PhD thesis: Exact Sampling and Optimisation in Statistical Machine Translation

Wilker Aziz (2013) Exact Sampling and Optimisation in Statistical Machine Translation. PhD Thesis, University of Wolverhampton, UK

Abstract

In Statistical Machine Translation (SMT), inference needs to be performed over a high-complexity discrete distribution defined by the intersection between a translation hypergraph and a target language model. This distribution is too complex to be represented exactly and one typically resorts to approximation techniques either to perform optimisation — the task of searching for the optimum translation — or sampling — the task of finding a subset of translations that is statistically representative of the goal distribution. Beam-search is an example of an approximate optimisation technique, where maximisation is performed over a heuristically pruned representation of the goal distribution.

For inference tasks other than optimisation, rather than finding a single optimum, one is really interested in obtaining a set of probabilistic samples from the distribution. This is the case in training where one wishes to obtain unbiased estimates of expectations in order to fit the parameters of a model. Samples are also necessary in consensus decoding where one chooses from a sample of likely translations the one that minimises a loss function. Due to the additional computational challenges posed by sampling, n-best lists, a by-product of optimisation, are typically used as a biased approximation to true probabilistic samples. A more direct procedure is to attempt to directly draw samples from the underlying distribution rather than rely on n-best list approximations.

Markov Chain Monte Carlo (MCMC) methods, such as Gibbs sampling, offer a way to overcome the tractability issues in sampling, however their convergence properties are hard to assess. That is, it is difficult to know when, if ever, an MCMC sampler is producing samples that are compatible with the goal distribution. Rejection sampling, an MC method, is more fundamental and natural, it offers strong guarantees, such as unbiased samples, but is typically hard to design for distributions of the kind addressed in SMT, rendering an intractable method.

A recent technique that stresses a unified view between the two types of inference tasks discussed here — optimisation and sampling — is the OS* approach. OS* can be seen as a cross between Adaptive Rejection Sampling (an MC method) and A* optimisation. In this view the intractable goal distribution is upperbounded by a simpler (thus tractable) proxy distribution, which is then incrementally refined to be closer to the goal until the maximum is found, or until the sampling performance exceeds a certain level.

This thesis introduces an approach to exact optimisation and exact sampling in SMT by addressing the tractability issues associated with the intersection between the translation hypergraph and the language model. The two forms of inference are handled in a unified framework based on the OS* approach. In short, an intractable goal distribution, over which one wishes to perform inference, is upperbounded by tractable proposal distributions.

A proposal represents a relaxed version of the complete space of weighted translation derivations, where relaxation happens with respect to the incorporation of the language model. These proposals give an optimistic view on the true model and allow for easier and faster search using standard dynamic programming techniques. In the OS* approach, such proposals are used to perform a form of adaptive rejection sampling. In rejection sampling, samples are drawn from a proposal distribution and accepted or rejected as a function of the mismatch between the proposal and the goal. The technique is adaptive in that rejected samples are used to motivate a refinement of the upperbound proposal that brings it closer to the goal, improving the rate of acceptance. Optimisation can be connected to an extreme form of sampling, thus the framework introduced here suits both exact optimisation and exact sampling.

BibTeX

@PhdThesis{aziz-phd,
   author =   {Wilker Aziz},
   title =    {Exact Sampling and Optimisation in Statistical Machine                         Translation},
   year =     {2014},
   address =  {Wolverhampton, UK},
   URL =      {http://clg.wlv.ac.uk/papers/aziz-thesis.pdf}
}