ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic-English.
Injy HamedNizar HabashSlim AbdennadherNgoc Thang VuPublished in: WANLP@EMNLP (2022)
Keyphrases
- statistical machine translation
- parallel corpus
- machine translation
- machine translation system
- parallel corpora
- mono lingual
- cross language information retrieval
- english words
- sentence pairs
- unknown words
- cross lingual
- comparable corpora
- chinese english
- speech recognition
- query translation
- arabic language
- text to speech
- language identification
- spontaneous speech
- spoken language
- english text
- broadcast news
- language model
- link grammar
- conversational speech
- target language
- training corpus
- cross language
- word alignment
- source language
- speaker identification
- person names
- finite state transducers
- multiword
- bilingual dictionaries
- cross language retrieval
- language resources
- question answering
- open domain
- speech corpus
- out of vocabulary
- automatic speech recognition
- speech signal
- language independent
- language processing
- speech synthesis
- cross language ir
- word pairs
- natural language processing
- translation model
- pronominal anaphora
- information retrieval
- word forms
- wide coverage
- handwriting recognition
- dialogue system
- natural language
- broad coverage
- arabic documents
- word sense disambiguation