Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections.
Yi-An LaiXuan ZhuYi ZhangMona T. DiabPublished in: LREC (2020)
Keyphrases
- text collections
- text categorization
- information retrieval
- document collections
- text documents
- text retrieval
- textual data
- data mining
- digital libraries
- active learning
- nearest neighbor
- inverted index
- index structure
- image representation
- database management systems
- multi dimensional
- database
- data points
- training data
- metadata
- feature selection
- neural network
- databases