Data distribution debugging in machine learning pipelines.
Stefan GrafbergerPaul GrothJulia StoyanovichSebastian SchelterPublished in: VLDB J. (2022)
Keyphrases
- data distribution
- machine learning
- data streams
- high dimensional data
- data mining
- index structure
- data points
- training instances
- communication cost
- computer vision
- concept drift
- decision trees
- pattern recognition
- distributed data
- decision boundary
- skyline queries
- image data
- model based diagnosis
- neural network
- streaming data
- data skew
- multi dimensional data
- learning tasks
- multi dimensional
- active learning
- feature space
- data analysis
- feature selection
- learning algorithm