Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development.
Daoyuan ChenHaibin WangYilun HuangCe GeYaliang LiBolin DingJingren ZhouPublished in: CoRR (2024)
Keyphrases
- data sets
- data analysis
- data sources
- raw data
- data collection
- data model
- complex data
- database
- image data
- probability distribution
- data structure
- computer systems
- sensor data
- missing data
- query language
- decision trees
- data processing
- object oriented
- small number
- statistical analysis
- synthetic data
- training data
- data points
- search engine
- data objects
- data quality
- real world