Data Portraits: Recording Foundation Model Training Data.
Marc MaroneBenjamin Van DurmePublished in: CoRR (2023)
Keyphrases
- training data
- test data
- data sets
- prior knowledge
- experimental data
- input data
- probability distribution
- mathematical model
- probabilistic model
- data structure
- database
- raw data
- high level
- measured data
- data processing
- data collection
- noisy data
- prior information
- em algorithm
- classification models
- data distribution
- sensor data
- theoretical foundation
- labeled data
- small number
- knowledge discovery
- simulation data
- missing data
- feature selection
- data samples
- training dataset
- learning models
- similarity measure
- high quality
- data quality
- data analysis
- feature space
- training set
- classification accuracy
- statistical model
- statistical analysis