Incremental N-gram Approach for Language Identification in Code-Switched Text.
Prajwol ShresthaPublished in: CodeSwitch@EMNLP (2014)
Keyphrases
- n gram
- language identification
- language model
- document images
- variable length
- language independent
- text classification
- bag of words
- speaker identification
- character n grams
- language modeling
- web documents
- indian languages
- word segmentation
- word level
- viterbi algorithm
- information retrieval
- language specific
- text retrieval
- text data
- document retrieval
- non stationary
- text mining
- information extraction
- classification accuracy