Native Language Identification Using a Mixture of Character and Word N-grams.
Elham MohammadiHadi VeisiHessam AminiPublished in: BEA@EMNLP (2017)
Keyphrases
- n gram
- language identification
- language model
- mixture model
- speaker identification
- word segmentation
- document images
- text classification
- bag of words
- variable length
- language independent
- part of speech
- text lines
- gaussian mixture model
- language modeling
- character n grams
- indian languages
- web documents
- expectation maximization
- machine learning
- language specific
- data mining