The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.
More details are given in this paper:
A. Rousseau, P. Deléglise, and Y. Estève, “TED-LIUM: an automatic speech recognition dedicated corpus”, in Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), May 2012.
Please cite this reference if you use these data in your research work.