DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation.
Rachel BawdenEric BilinskiThomas LavergneSophie RossetPublished in: Lang. Resour. Evaluation (2021)
Keyphrases
- machine translation
- statistical machine translation
- chinese english
- parallel corpora
- parallel corpus
- cross lingual
- machine translation system
- conversational speech
- word alignment
- cross language information retrieval
- language independent
- spontaneous speech
- natural language processing
- comparable corpora
- information extraction
- target language
- pos tagging
- language processing
- cross lingual information retrieval
- word sense disambiguation
- machine readable dictionaries
- query translation
- english chinese
- source language
- language resources
- word level
- finite state transducers
- bilingual lexicon
- bilingual dictionaries
- lexical knowledge
- natural language