Manipulating Large Corpora for Text Classification.
Fumiyo FukumotoYoshimi SuzukiPublished in: EMNLP (2002)
Keyphrases
- text classification
- text data
- training corpus
- bag of words
- text categorization
- text mining
- machine learning
- text documents
- feature selection
- naive bayes
- labeled data
- multi label
- natural language processing
- text corpora
- n gram
- document classification
- text classifiers
- data cleaning
- knn
- unlabeled data
- sentiment analysis
- language modeling
- cross lingual
- vector space
- learning algorithm