Data smells: categories, causes and consequences, and detection of suspicious data in AI-based systems.
Harald FoidlMichael FeldererRudolf RamlerPublished in: CAIN (2022)
Keyphrases
- data sets
- synthetic data
- complex data
- experimental data
- training data
- data structure
- data analysis
- statistical analysis
- data distribution
- data points
- data processing
- high quality
- data sources
- data mining
- small number
- sensor data
- missing data
- data collection
- computer systems
- machine learning
- database
- data acquisition
- website
- high dimensional data
- detection method
- input data
- distributed systems
- knowledge discovery
- probability distribution
- prior knowledge
- knowledge base