Gender Prediction in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System.
Ankush KhandelwalSahil SwamiSyed Sarfaraz AkhtarManish ShrivastavaPublished in: CoRR (2018)
Keyphrases
- statistical machine translation
- machine translation
- language identification
- social media content
- comparable corpora
- link grammar
- security informatics
- proper names
- person names
- contextual features
- sentiment analysis
- english words
- parallel corpus
- penn treebank
- internet usage
- natural language
- parallel corpora
- language model
- social media
- indian languages
- website
- open domain
- machine translation system
- word sense
- cross language information retrieval
- cross lingual
- spoken language
- multiword
- target language
- web search
- feature selection