The Fault in Our Data Stars: Studying Mitigation Techniques against Faulty Training Data in Machine Learning Applications.
Abraham ChanArpan GujaratiKarthik PattabiramanSathish GopalakrishnanPublished in: DSN (2022)
Keyphrases
- training data
- data sets
- machine learning
- input data
- raw data
- high dimensional data
- data analysis
- supervised learning
- data distribution
- high quality
- original data
- learning algorithm
- data samples
- data collection
- decision trees
- data mining
- test data
- database
- multiple faults
- incomplete data
- learning tasks
- machine learning methods
- fault diagnosis
- missing data
- synthetic data
- training samples
- statistical analysis
- support vector machine
- data points