Towards Better Inclusivity: A Diverse Tweet Corpus of English Varieties.
Nhi PhamLachlan PhamAdam L. MeyersPublished in: CoRR (2024)
Keyphrases
- link grammar
- person names
- statistical machine translation
- parallel corpus
- topic tracking
- open domain
- wide coverage
- named entities
- broad coverage
- english words
- english language
- training corpus
- machine translation
- mono lingual
- cross lingual
- answer questions
- multiword
- language specific
- natural language
- social media
- sentence pairs
- language learning
- linguistic features
- sentiment analysis
- word sense
- penn treebank
- comparable corpora
- information extraction
- machine translation system
- semantic roles
- manually annotated
- tree bank
- bag of words
- natural language processing
- language model
- text classification
- stop words
- topic detection and tracking
- chinese english
- n gram
- word level
- sentence level
- cross language