Corpus: MANY

Licence: GNU GPL v3
URL: https://code.google.com/archive/p/many/

Many is a MT System Combination software which architecture is described in the following picture :

 

 

The combination can be decomposed into three steps

  • 1-Best hypotheses from all M systems are aligned in order to build M confusion networks (one for each system considered as backbone).
  • All cn are connected into a single lattice. the first nodes of each cn are connected to a unique first node with probabilities equal to the priors probabilities assigned to the corresponding backbone. the final nodes are connected to a single final node with arc probability of one.
  • A token pass decoder is used along with a language model to decode the resulting lattice and the best hypothesis is generated.

The decoder can be expressed as follow : where

  • Len(W) is the length of the hypothesis,
  • Pws(n) is the score of the n-th word,
  • α is the fudge factor,
  • Plm(n) is the lm probability of the n-th word,
  • Lenpen(w) is the length penalty of the word sequence,
  • Nullpen(w)$ is the penalty associated with the number of null-arcs crossed to obtain the hypothesis.