Using Cluster-Based Sampling to Select Initial Training Set for Active Learning in Text Classification.
Jaeho KangKwang Ryel RyuHyuk-Chul KwonPublished in: PAKDD (2004)
Keyphrases
- text classification
- active learning
- random sampling
- sampling strategies
- labeled data
- representative samples
- machine learning
- bag of words
- unlabeled data
- feature selection
- sample selection
- text categorization
- text mining
- stratified sampling
- active sampling
- transfer learning
- selective sampling
- text classification tasks
- n gram
- text documents
- text classifiers
- text data
- pool based active learning
- semi supervised
- naive bayes
- data cleaning
- learning algorithm
- knn
- monte carlo
- semantic features
- selection algorithm
- training examples
- semi supervised learning
- sampling algorithm
- sampling strategy
- learning strategies
- experimental design
- multi label
- feature extraction
- training data
- training set
- sample size
- supervised learning
- sampling methods
- sentiment analysis
- learning process
- batch mode