Data Portraits: Recording Foundation Model Training Data.
Marc MaroneBenjamin Van DurmePublished in: NeurIPS (2023)
Keyphrases
- training data
- data sets
- experimental data
- prior knowledge
- input data
- test data
- data collection
- raw data
- simulation data
- synthetic data
- data sources
- database
- computational model
- data processing
- data structure
- probabilistic model
- noisy data
- empirical data
- small number
- mathematical model
- training process
- data samples
- theoretical foundation
- prior information
- machine learning
- network structure
- high level
- learning algorithm
- decision trees
- statistical model
- data model
- statistical analysis
- model selection
- em algorithm
- high quality
- image data
- knowledge discovery
- data points
- domain knowledge