Sampling the Web as Training Data for Text Classification.
Wei-Yen DayChun-Yi ChiRuey-Cheng ChenPu-Jen ChengPublished in: Int. J. Digit. Libr. Syst. (2010)
Keyphrases
- text classification
- training data
- labeled data
- labeled training data
- website
- web applications
- bag of words
- data sets
- feature selection
- naive bayes
- unlabeled data
- training corpus
- training set
- learning algorithm
- training documents
- text categorization
- machine learning
- text mining
- information sources
- semantic web
- text data
- web content
- n gram
- prior knowledge
- web technologies
- web pages
- multi label
- classification accuracy
- sampled data
- text classifiers
- web data
- web mining
- test data
- test set
- web documents
- training examples
- decision trees
- semi supervised learning
- end users
- text documents
- sentiment analysis
- training process
- training dataset
- data sources
- neural network
- database