Text categorization for multi-page documents: a hybrid naive Bayes HMM approach.
Paolo FrasconiGiovanni SodaAlessandro VulloPublished in: JCDL (2001)
Keyphrases
- text categorization
- naive bayes
- text classifiers
- text documents
- document classification
- automatic text categorization
- text classification
- term frequency
- feature selection
- logistic regression
- naive bayes classifier
- classification algorithm
- multi label
- term weighting
- information gain
- information retrieval systems
- keywords
- knn
- web pages
- information retrieval
- document collections
- web documents
- tf idf
- k nearest neighbor
- document retrieval
- text data
- semi supervised learning
- document clustering
- relevant documents
- text mining
- vector space model
- information extraction
- unlabeled data
- data mining