An English-Hindi Code-Mixed Corpus: Stance Annotation and Baseline System.
Sahil SwamiAnkush KhandelwalVinay SinghSyed Sarfaraz AkhtarManish ShrivastavaPublished in: CoRR (2018)
Keyphrases
- statistical machine translation
- annotated corpus
- inter annotator agreement
- named entity recognition
- language identification
- machine translation
- comparable corpora
- proper names
- link grammar
- person names
- indian languages
- parallel corpora
- semantic annotation
- cross lingual
- open domain
- training corpus
- wide coverage
- parallel corpus
- mono lingual
- english words
- contextual features
- cross language information retrieval
- broad coverage
- target language
- multiword
- named entities
- english language
- source language
- active learning
- machine translation system
- query translation
- information extraction
- word sense
- sentence pairs
- metadata
- linguistic features
- penn treebank
- natural language
- word order
- answer questions
- labor intensive
- image annotation
- source code
- natural language processing
- tree bank
- image retrieval