Corpus: ALLIES (Corpus ALLIES)


Description

The ALLIES Corpus was produced within the European CHIST-Era project ALLIES. The ALLIES project enabled to carry out a campaign for the evaluation of Broadcast News across time diarization systems using French data. This project is an extension of the previous ESTER, REPERE and ETAPE evaluation campaigns that were carried out for the French language in this field.

This corpus is based on the material that was used for the ESTER 1&2 (including 128 files from EPAC), REPERE and ETAPE evaluation packages with New data collected since 2014 (see ELRA Catalogue: http://catalogue.elra.info for respective packages). The ALLIES corpus was built as an extension of the previous produced corpora. It contains corrected annotations from the previous evaluation materials as well as new audio data with corresponding transcriptions. Corrections include corrected names of speakers and re-segmentation.

The segmentation tasks consist of segmentation in sound events, speaker tracking and speaker segmentation, detailed as follows:

  • For the sound event segmentation, the task consists of tracking the parts which contain music (with or without speech) and the parts which contain speech (with or without music).
  • The speaker tracking task consists in detecting the parts of the document that correspond to a given speaker.
  • The speaker segmentation consists of segmenting the document in speakers and grouping the parts spoken by the same speaker.

 

Content

Overall, the ALLIES Corpus contains about 900 hours of news broadcast, including orthographic transcriptions, speaker annotations and segmentation.

  • 1176 WAV files (around 500 hours of speech)
  • 1176 TRS files (speaker turns and orthographic transcriptions)
  • A train/test partition
    • Train 545 + 128 files
    • DiarTest-SeenShows 181 files with shows already present in the train split
    • DiarTest-UnseenShows 286 files with shows that are not in the train split
    • FullTest-CleanAnnot 35 files manually checked with music and noise annotations.

 

If you use this data, please cite the following paper:

Marie Tahon, Anthony Larcher, Martin Lebourdais, Fethi Bougares, Ana Silnova, Pablo Gimeno. ALLIES: A Speech Corpus for Segmentation, Speaker Diarization Speech Recognition and Speaker Change Detection. In Proc. of LREC-Coling, Torino, Italy, 2024.

Access through ELRA catalogue https://catalog.elra.info/en-us/repository/browse/ELRA-S0486/
 

Additional annotations and associated studies

1 – Overlapped speech type and emotion annotations in ALLIES

Description

Interruption detection is a new yet challenging task in the field of speech processing. We provide overlapped speech annotations on a selection of conversational data from ALLIES.
This selection includes 4000 segments in which at least two speakers are present. The annotated segments cover 4 seconds before the overlap segment, and 4 second after. This corpus serves as a valuable resource for evaluating and advancing interruption detection techniques.
A first baseline system, which uses speech processing methods to automatically identify interruptions in speech and its evaluation is presented in the following article. Our findings can not only serve as a foundation for further research in the field but also provide a benchmark for assessing future advancements in automatic speech interruption detection.

 

Content

  • Show name, start and stop times of 4000 segments
  • Split: to which subset belongs the segment (train/test)
  • Type of overlap (ovtype)
  • Emotion before (emoA) and after (emoB) the overlap segment
  • Dominance before (dominance) and after the overlap segment

If you use this data, please cite the following paper:
Martin Lebourdais, Marie Tahon, Antoine Laurent et Sylvain Meignier. Automatic Speech Interruption Detection: Analysis, Corpus, and System, In Proc. of LREC-COLING, Torino, Italy, 2024. Lien: https://hal.science/hal-04576488

Link to download the data: lrec_2024_inter_annotations

Format of the CSV :
show,split,start,stop,ovtype1,ovtype2,ovtype3,emoA1,emoA2,emoA3,emoB1,emoB2,emoB3,emoC1,emoC2,emoC3,dominance1,dominance2,dominance3

 

2 – Transition-Relevance Places and Interruptions in ALLIES

Description

Few speech resources describe interruption phenomena, especially for TV and media content. The description of these phenomena may vary across authors: it thus leaves room for improved annotation protocols.

We provide annotations of Transition-Relevance Places and speech floor-taking event types on the subset FullTest-CleanAnnot. 2041 audio segments have been selected so that there is a speaker change in the middle. This speaker change can be due to the presence of overlapping speech or not. The first interval starts at the beginning of an utterance and ends at the speaker change. The last interval (second or third) starts after the speaker change or the overlap and ends at the end of the next utterance.

Each speaker change is annotated with the presence or absence of a TRP (Term/NonTerm), and a classification of the next-speaker turn-taking (Smooth, backchannel, cooperative or competitive, successful or attempted interruption). An inter-rater agreement analysis shows such annotations moderate to substantial reliability. These results underline the importance of low-level features like TRP to derive a classification of turn changes that would be less subject to interpretation. The analysis of the presence of overlapping speech highlights the existence of interruptions without overlaps and smooth transitions with overlaps.

 

Content

(X is the ID of the speakers 1 and 2, while Y is the ID of the interval 0,1 or 2)

  • fname: File name
  • tstart: Timecode (in seconds) of the start of the sample
  • tstop: Timecode (in seconds) of the end of the sample
  • spkX: Name of speaker X (X=0 or 1)
  • n_interval: Total number of intervals (2 or 3)
  • durY: Duration (in seconds) of interval Y (Y = first, last or ov) . If the number of intervals is 2, there is no overlapping speech and dur_ov=0
  • activity_Y_X: Does speaker X speaks in interval Y ? (True/False)
  • term_X_Y: Terminality classification of speaker X in interval Y (Term/NonTerm)
  • turntype_X_Y: Turn-Taking classification of speaker X in interval Y
  • comment: Comment made by the annotator (in French)
  • invalid: Classified as Invalid by the annotator (True/False)

If you use this data, please cite the following paper :
Rémi Uro, Marie Tahon, Jane Wottawa, David Doukhan, Albert Rillard, Antoine Laurent. Annotation of Transition-Relevance Places and Interruptions for the Description of Turn-Taking in Conversations in French Media Content, In Proc. of LREC-COLING, Torino, Italy, 2024.

Link to download the data : lrec_2024_turn_taking_annotations_clean