Native Language Identification Using a Mixture of Character and Word N-grams.

Elham Mohammadi Hadi Veisi Hessam Amini

Published in: BEA@EMNLP (2017)

Keyphrases

n gram
language identification
language model
mixture model
speaker identification
word segmentation
document images
text classification
bag of words
variable length
language independent
part of speech
text lines
gaussian mixture model
language modeling
character n grams
indian languages
web documents
expectation maximization
machine learning
language specific
data mining