LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods.
Zhenhua WangGuang XuMing RenPublished in: CoRR (2024)
Keyphrases
- data mining methods
- data analysis
- data sets
- natural language
- synthetic data
- data mining techniques
- image data
- data processing
- database
- benchmark datasets
- incomplete data
- original data
- statistical analysis
- computational cost
- data mining applications
- data quality
- noisy data
- spectral clustering
- raw data
- statistical methods
- experimental data
- spatial data
- knowledge discovery
- data points
- high quality
- human experts
- attribute values
- computer systems
- data collection
- training data
- neural network
- multiple sources
- data reduction