Extension of Zipf's Law to Word and Character N-grams for English and Chinese.
Le Quan HaElvira I. Sicilia-GarciaJi MingFrancis Jack SmithPublished in: Int. J. Comput. Linguistics Chin. Lang. Process. (2003)
Keyphrases
- character n grams
- n gram
- cross language
- cross language information retrieval
- variable length
- english chinese
- word segmentation
- language specific
- language model
- optical character recognition
- language independent
- english text
- text retrieval
- machine translation
- bag of words
- unknown words
- language modeling
- text classification
- query translation
- cross lingual
- word level
- morphological analysis
- information access
- parallel corpora
- bilingual dictionaries
- question answering
- machine learning
- document images
- document collections