Word Level Language Identification in English Telugu Code Mixed Data.
Sunil GundapuRadhika MamidiPublished in: CoRR (2020)
Keyphrases
- word level
- language identification
- document images
- mixed data
- english text
- data compression
- document analysis
- indian languages
- language independent
- knn
- data sets
- similarity function
- optical character recognition
- machine translation
- clustering algorithm
- n gram
- text lines
- character recognition
- text retrieval
- high dimensional