Software: Continuous Space Language Model toolkit (CSLM)
CSLM toolkit is open-source software which implements the so-called continuous space language model. The basic idea of this approach is to project the word indices onto a continuous space and to use a probability estimator operating on this space. Since the resulting probability functions are smooth functions of the word representation, better generalization to unknown events can be expected. A neural network can be used to simultaneously learn the projection of the words onto the continuous space and to estimate the n-gram probabilities. This is still a n-gram approach, but the LM probabilities are interpolated for any possible context of length n-1 instead of backing-off to shorter contexts. This approach was successfully used in large vocabulary continuous speech recognition and in phrase-based SMT systems. Detailed information is available in the following publications:
When using this software, please cite those references. The development of the CSLM toolkit was partially financed by the European projects EuroMatrix and Matecat, the ANR project COSMAT and the DARPA project BOLT.
The toolkit will be frequently updated. You can join the CSLM google group to be informed on updates, bug fixes or discuss best usage.