Word n-gram probability estimation from a Japanese raw corpus.

Shinsuke Mori Daisuke Takuma

Published in: INTERSPEECH (2004)

Keyphrases

n gram
probability estimation
language model
text classification
naive bayes
multi class classification
decision trees
language modeling
character n grams
language independent
variable length
word segmentation
part of speech
roc curve
web documents
multi class
classification trees
knn
word pairs
word level
language specific
probabilistic model