Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec.
DongHwa KimDeokseong SeoSuhyoun ChoPilsung KangPublished in: Inf. Sci. (2019)
Keyphrases
- text documents
- tf idf
- document representation
- topic models
- text classification
- document clustering
- text mining
- text categorization
- unlabeled data
- bag of words
- semi supervised learning
- information extraction
- keywords
- named entities
- labeled data
- wordnet
- generative model
- semi supervised
- probabilistic model
- feature extraction
- vector space model
- face recognition
- feature selection
- n gram
- co occurrence
- databases
- principal component analysis
- dimensionality reduction
- knn
- active learning
- high dimensional
- multiscale
- information retrieval
- machine learning
- data mining