MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible.
Marcely Zanon BoitoWilliam HavardMahault GarnerinÉric Le FerrandLaurent BesacierPublished in: LREC (2020)
Keyphrases
- parallel corpus
- natural language
- sentence level
- training corpus
- linguistic features
- sentence pairs
- cross lingual
- text corpus
- text generation
- document level
- statistical machine translation
- cross language information retrieval
- conversational speech
- language independent
- recognizing textual entailment
- language understanding
- machine translation system
- information retrieval
- sentiment analysis
- semantic roles
- manually annotated
- text classification
- parallel corpora
- speech recognition
- digital libraries
- machine translation
- spontaneous speech
- semantic analysis
- probabilistic context free grammars
- cross language