On the influence of training data quality on text document classification using machine learning methods.
Jyri SaarikoskiHenry JoutsijokiKalervo JärvelinJorma LaurikkalaMartti JuholaPublished in: Int. J. Knowl. Eng. Data Min. (2015)
Keyphrases
- document classification
- data quality
- text classifiers
- text documents
- text mining
- web documents
- text categorization
- text classification
- document categorization
- quality management
- classification algorithm
- data cleansing
- data warehouse
- information retrieval
- training set
- keywords
- supervised learning
- knowledge discovery
- text analysis
- information extraction
- prior knowledge
- topic models
- natural language
- document clustering
- decision trees
- e learning
- real world
- databases
- database