Hidden Markov Models for Text Categorization in Multi-Page Documents.
Paolo FrasconiGiovanni SodaAlessandro VulloPublished in: J. Intell. Inf. Syst. (2002)
Keyphrases
- hidden markov models
- text categorization
- text documents
- automatic text categorization
- document classification
- training documents
- term frequency
- text classifiers
- text collections
- text classification
- feature selection
- knn
- website
- keywords
- distributional clustering
- document collections
- semi supervised learning
- document clustering
- k nearest neighbor
- vector space model
- information retrieval systems
- information retrieval
- unlabeled data
- tf idf
- document representation
- viterbi algorithm
- web pages
- neural network
- document retrieval
- active learning
- decision trees