Language identification in multilingual, short and noisy texts using common N-grams.

Dijana Kosmajac Vlado Keselj

Published in: IEEE BigData (2017)

Keyphrases

n gram
language identification
language independent
multi lingual
language specific
indian languages
language model
speaker identification
text classification
bag of words
document images
noisy environments
variable length
word segmentation
part of speech
language modeling
cross lingual
keywords
text documents
digital libraries
natural language
speech recognition
web documents
image classification
knn
data analysis
word level
neural network