Document preprocessing for naive Bayes classification and clustering with mixture of multinomials.
Dmitry PavlovRamnath BalasubramanyanByron DomShyam KapurJignashu ParikhPublished in: KDD (2004)
Keyphrases
- text clustering
- preprocessing
- naive bayes classification
- document clustering
- clustering algorithm
- k means
- hierarchical clustering
- text classifiers
- text categorization
- document collections
- text documents
- text mining
- clustering method
- uncertain data
- document classification
- unsupervised learning
- information retrieval systems
- clustering quality
- naive bayes
- topic models
- wordnet
- text data
- naive bayes classifier
- feature space
- data structure