N-gram Statistics in English and Chinese: Similarities and Differences.
Stewart YangHongjun ZhuAriel ApostoliPei CaoPublished in: ICSC (2007)
Keyphrases
- n gram
- word segmentation
- word level
- language specific
- english text
- character n grams
- language model
- chinese language
- language independent
- language modeling
- dependency parser
- variable length
- text classification
- bag of words
- part of speech
- cross lingual
- language modelling
- natural language
- inside outside algorithm
- chinese characters
- information retrieval
- viterbi algorithm
- machine translation
- similarity measure
- statistical language modeling
- data mining
- parse tree
- neural network
- query terms
- out of vocabulary
- web documents