Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources.
Feiyang KangHoang Anh JustAnit Kumar SahuRuoxi JiaPublished in: NeurIPS (2023)
Keyphrases
- data sources
- data sets
- data quality
- synthetic data
- high quality
- statistical methods
- knowledge discovery
- small number
- data processing
- raw data
- data acquisition
- statistical analysis
- computer systems
- database
- probability distribution
- dynamic programming
- prior knowledge
- machine learning
- historical data
- input data
- multiple sources
- complex data
- databases
- noisy data
- original data
- decision trees
- training data
- application domains
- sensor data
- data analysis
- labeled data