TC-DCA: a system for text classification based on document's content allocation.
Wenbo LiLe SunZhenzhong ZhangXue JiangWeiru ZhangPublished in: CIKM (2010)
Keyphrases
- text classification
- text documents
- document classification
- text classifiers
- term frequency
- web documents
- multimedia documents
- document content
- textual content
- relevant content
- topic discovery
- bag of words
- automatic text classification
- text categorization
- feature selection
- information retrieval systems
- automatic text categorization
- semantic information
- pdf files
- naive bayes
- n gram
- content and structure
- document collections
- metadata
- document representation
- training documents
- text content
- labeled data
- document structure
- web content
- text mining
- knn
- artificial intelligence
- information retrieval
- structured documents
- data cleaning
- user generated content
- machine learning
- resource allocation
- multimedia
- semantic features
- vector space model
- query expansion
- natural language