Corpus: MANY
Licence: GNU GPL v3
URL: https://code.google.com/archive/p/many/
Many is a MT System Combination software which architecture is described in the following picture :
The combination can be decomposed into three steps
- 1-Best hypotheses from all M systems are aligned in order to build M confusion networks (one for each system considered as backbone).
- All cn are connected into a single lattice. the first nodes of each cn are connected to a unique first node with probabilities equal to the priors probabilities assigned to the corresponding backbone. the final nodes are connected to a single final node with arc probability of one.
- A token pass decoder is used along with a language model to decode the resulting lattice and the best hypothesis is generated.
The decoder can be expressed as follow : where
- Len(W) is the length of the hypothesis,
- Pws(n) is the score of the n-th word,
- α is the fudge factor,
- Plm(n) is the lm probability of the n-th word,
- Lenpen(w) is the length penalty of the word sequence,
- Nullpen(w)$ is the penalty associated with the number of null-arcs crossed to obtain the hypothesis.