CMIR: A Corpus for Evaluation of Code Mixed Information Retrieval of Hindi-English Tweets.
Kunal ChakmaAmitava DasPublished in: Computación y Sistemas (2016)
Keyphrases
- information retrieval
- statistical machine translation
- machine translation
- information extraction
- person names
- named entities
- proper names
- social media
- natural language
- comparable corpora
- broad coverage
- open domain
- text mining
- ir evaluation
- document collections
- link grammar
- search engine
- information retrieval systems
- language model
- cross language
- source code
- language modeling
- language identification
- spoken language
- multiword
- parallel corpora
- news articles
- document retrieval
- contextual features
- broadcast news
- document corpus
- wide coverage
- training corpus
- penn treebank