ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English.
Injy HamedNizar HabashSlim AbdennadherNgoc Thang VuPublished in: CoRR (2022)
Keyphrases
- statistical machine translation
- parallel corpus
- machine translation
- machine translation system
- parallel corpora
- cross language information retrieval
- mono lingual
- english words
- broadcast news
- comparable corpora
- cross lingual
- language identification
- query translation
- sentence pairs
- speech recognition
- chinese english
- unknown words
- arabic language
- speaker identification
- text to speech
- language resources
- spontaneous speech
- cross language
- link grammar
- training corpus
- word alignment
- speech corpus
- source language
- conversational speech
- spoken language
- translation model
- speech synthesis
- target language
- speech signal
- english text
- person names
- language independent
- automatic speech recognition
- wide coverage
- out of vocabulary
- spoken document retrieval
- finite state transducers
- pronominal anaphora
- broad coverage
- morphological analysis
- language model
- open domain
- cross language retrieval
- bilingual dictionaries
- cross language ir
- multiword
- noun phrases
- hidden markov models
- arabic documents
- wordnet
- word sense disambiguation
- language processing
- word sense
- semantic roles
- word level