Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics.
Swabha SwayamdiptaRoy SchwartzNicholas LourieYizhong WangHannaneh HajishirziNoah A. SmithYejin ChoiPublished in: EMNLP (1) (2020)
Keyphrases
- training dataset
- synthetic datasets
- benchmark datasets
- text classification tasks
- training set
- uci datasets
- restricted boltzmann machine
- high dimensional datasets
- recurrent networks
- real life
- text classification
- image dataset
- class imbalanced data
- ground truth labels
- linear svm
- database
- training and testing data
- massive datasets
- training data
- training samples
- dynamical systems
- text classifiers
- million images
- dynamic model
- microarray datasets
- training algorithm
- computer graphics
- test set
- feature vectors
- neural network
- standard learning algorithms
- artificial and real world datasets
- recurrent neural networks
- decision trees
- training process
- high dimensional
- supervised learning
- photo collections
- object detectors
- training phase
- training examples