Word n-gram probability estimation from a Japanese raw corpus.
Shinsuke MoriDaisuke TakumaPublished in: INTERSPEECH (2004)
Keyphrases
- n gram
- probability estimation
- language model
- text classification
- naive bayes
- multi class classification
- decision trees
- language modeling
- character n grams
- language independent
- variable length
- word segmentation
- part of speech
- roc curve
- web documents
- multi class
- classification trees
- knn
- word pairs
- word level
- language specific
- probabilistic model