The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators.
Tzu-Heng HuangCatherine CaoVaishnavi BhargavaFrederic SalaPublished in: CoRR (2024)
Keyphrases
- synthetic data
- data sets
- image data
- raw data
- application domains
- data analysis
- original data
- input data
- complex data
- high dimensional data
- data processing
- database
- ground truth
- data structure
- database systems
- information systems
- data mining techniques
- knowledge discovery
- data collection
- probability distribution
- active learning
- missing data
- data mining algorithms
- experimental data
- high quality
- semi automated