Word Level Language Identification in English Telugu Code Mixed Data.
Sunil GundapuRadhika MamidiPublished in: PACLIC (2018)
Keyphrases
- word level
- language identification
- document images
- mixed data
- english text
- document analysis
- data compression
- indian languages
- data sets
- language independent
- knn
- clustering algorithm
- similarity function
- optical character recognition
- machine translation
- n gram
- text lines
- non stationary
- co occurrence
- pattern recognition