Language identification in multilingual, short and noisy texts using common N-grams.
Dijana KosmajacVlado KeseljPublished in: IEEE BigData (2017)
Keyphrases
- n gram
- language identification
- language independent
- multi lingual
- language specific
- indian languages
- language model
- speaker identification
- text classification
- bag of words
- document images
- noisy environments
- variable length
- word segmentation
- part of speech
- language modeling
- cross lingual
- keywords
- text documents
- digital libraries
- natural language
- speech recognition
- web documents
- image classification
- knn
- data analysis
- word level
- neural network