Mixat: A Data Set of Bilingual Emirati-English Speech.
Maryam Al AliHanan AldarmakiPublished in: CoRR (2024)
Keyphrases
- data sets
- machine translation
- cross lingual
- cross language
- broadcast news
- text to speech
- query translation
- cross language information retrieval
- language resources
- english text
- parallel corpus
- spoken language
- chinese english
- statistical machine translation
- parallel corpora
- speech recognition
- english chinese
- comparable corpora
- cross language retrieval
- target language
- bilingual lexicon
- english language
- multiword
- finite state transducers
- proper names
- word alignment
- spoken document retrieval
- source language
- real world
- audio visual
- speech recognition technology
- text retrieval
- bilingual dictionaries
- sentence pairs
- language independent
- machine translation system
- automatic speech recognition
- question answering
- natural language processing
- machine readable dictionaries
- training data
- text classification
- natural language
- training set
- co occurrence
- indian languages
- language model
- answer questions
- text categorization
- information retrieval systems
- cross lingual information retrieval
- spontaneous speech
- information extraction
- wordnet
- information access
- speaker identification
- speech signal
- noisy environments