Improving the text classification using clustering and a novel HMM to reduce the dimensionality.
A. Seara VieiraMaría Lourdes Borrajo DizEva Lorenzo IglesiasPublished in: Comput. Methods Programs Biomed. (2016)
Keyphrases
- text classification
- hidden markov models
- high dimensionality
- feature selection
- bag of words
- data clustering
- unsupervised learning
- topic discovery
- clustering algorithm
- k means
- text categorization
- clustering method
- high dimensional
- data sets
- data cleaning
- distributional clustering
- n gram
- text mining
- knn
- naive bayes
- dimensionality reduction
- multi label
- information theoretic
- high dimensional data sets
- document clustering
- subspace clustering
- machine learning
- databases
- high dimensional data space
- neural network
- data mining
- text classifiers
- semantic features
- text data
- language modeling
- cluster analysis
- text documents
- self organizing maps
- labeled data