Corpus: ALLIES (Corpus ALLIES)
Author(s): |
Author(s): |
The ALLIES Corpus was produced within the European CHIST-Era project ALLIES. The ALLIES project enabled to carry out a campaign for the evaluation of Broadcast News across time diarization systems using French data. This project is an extension of the previous ESTER, REPERE and ETAPE evaluation campaigns that were carried out for the French language in this field.
This corpus is based on the material that was used for the ESTER 1&2 (including 128 files from EPAC), REPERE and ETAPE evaluation packages with New data collected since 2014 (see ELRA Catalogue: http://catalogue.elra.info for respective packages). The ALLIES corpus was built as an extension of the previous produced corpora. It contains corrected annotations from the previous evaluation materials as well as new audio data with corresponding transcriptions. Corrections include corrected names of speakers and re-segmentation.
The segmentation tasks consist of segmentation in sound events, speaker tracking and speaker segmentation, detailed as follows:
Content
Overall, the ALLIES Corpus contains about 900 hours of news broadcast, including orthographic transcriptions, speaker annotations and segmentation.