MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible.
Marcely Zanon BoitoWilliam N. HavardMahault GarnerinÉric Le FerrandLaurent BesacierPublished in: CoRR (2019)
Keyphrases
- parallel corpus
- natural language
- sentence level
- sentence pairs
- cross lingual
- linguistic features
- training corpus
- noun phrases
- text generation
- text corpus
- speech recognition
- predicate argument
- digital libraries
- recognizing textual entailment
- spontaneous speech
- semantic role labeling
- document level
- semantic roles
- spoken language
- part of speech
- language independent
- cross language information retrieval
- cross language
- manually annotated
- text corpora
- natural language text
- machine translation