Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation.
Juhwan ChoiJungmin YunKyohoon JinYoungBin KimPublished in: CoRR (2024)
Keyphrases
- data sets
- data analysis
- cost efficient
- database
- original data
- raw data
- data points
- image data
- data processing
- prior knowledge
- clustering algorithm
- high quality
- privacy preserving
- knowledge discovery
- probability distribution
- data structure
- data collection
- synthetic data
- missing data
- data distribution
- multimedia data
- news articles
- data sources