Assessing the Impact of Class-Imbalanced Data for Classifying Relevant/Irrelevant Medline Documents.
Reyes PavónRosalía LazaMiguel Reboiro-JatoFlorentino Fdez-RiverolaPublished in: PACBB (2011)
Keyphrases
- class imbalanced data
- information retrieval
- highly relevant
- latent semantic indexing
- document collections
- feature selection
- information retrieval systems
- metadata
- web documents
- text documents
- query topic
- irrelevant documents
- document clustering
- vector space model
- text mining
- semantic relationships
- document classification
- patent documents
- bibliographic databases
- relevant content
- text classification
- document representation
- vector space
- relevant documents
- retrieved documents
- potentially relevant
- search history
- automatic text classification
- keywords
- scientific literature
- biomedical literature
- document analysis
- medical domain
- multi document summarization
- text classifiers
- free text
- document retrieval
- question answering
- text categorization