Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach.
Huy VoVasil KhalidovTimothée DarcetThéo MoutakanniNikita SmetaninMarc SzafraniecHugo TouvronCamille CouprieMaxime OquabArmand JoulinHervé JégouPatrick LabatutPiotr BojanowskiPublished in: CoRR (2024)
Keyphrases
- data sets
- database
- learning algorithm
- data processing
- high dimensional data
- data analysis
- prior knowledge
- spatial data
- learning models
- synthetic data
- background knowledge
- data collection
- raw data
- original data
- data objects
- spectral clustering
- data points
- multidimensional data
- fuzzy clustering
- hidden variables
- categorical data
- clustering analysis
- missing data
- knowledge acquisition
- knowledge discovery
- learning process
- training data
- statistical analysis
- self organizing maps
- unsupervised learning
- data clustering
- online learning
- data mining techniques
- supervised learning
- data quality
- semi supervised
- end users
- k means