N-Gram Language Modeling for Robust Multi-Lingual Document Classification.
Jörg SteffenPublished in: LREC (2004)
Keyphrases
- language modeling
- n gram
- document classification
- text classification
- language model
- language independent
- cross lingual
- web documents
- text categorization
- statistical language modeling
- text mining
- text documents
- bag of words
- query expansion
- retrieval model
- information retrieval
- feature selection
- machine learning
- cross language
- probabilistic model
- document retrieval
- labeled data
- unsupervised learning
- data analysis
- digital libraries
- web pages
- naive bayes
- knn
- word segmentation