hinglishNorm - A Corpus of Hindi-English Code Mixed Sentences for Text Normalization.
Piyush MakhijaAnkit KumarAnuj GuptaPublished in: COLING (Industry) (2020)
Keyphrases
- statistical machine translation
- machine translation system
- proper names
- training corpus
- noun phrases
- sentence level
- machine translation
- source language
- broad coverage
- mono lingual
- target language
- link grammar
- multiword
- text corpus
- word sense
- lexical features
- open domain
- linguistic features
- language identification
- word alignment
- comparable corpora
- english words
- parallel corpus
- cross lingual
- penn treebank
- named entities
- linguistic analysis
- word pairs
- indian languages
- sentence pairs
- parallel corpora
- natural language
- syntactic analysis
- english text
- semantic roles
- sentiment analysis
- document level
- word level
- named entity recognition
- text corpora
- text summarization
- plain text
- query translation
- linguistic patterns
- information extraction
- natural language text
- pos tagging
- contextual features
- language model
- natural language processing
- word sense disambiguation
- parse tree
- semantic parsing
- word frequency
- cross language information retrieval
- sentiment classification
- tree bank
- part of speech
- text mining
- named entity recognizer
- information retrieval
- unknown words
- text documents
- word order
- translation model
- text to speech
- document images
- question answering
- cross language
- arabic language
- co occurrence
- automatic summarization
- language independent
- spoken language
- dependency parsing
- natural language generation