Login / Signup
SlimPajama-DC: Understanding Data Combinations for LLM Training.
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
Hongyi Wang
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric P. Xing
Published in:
CoRR (2023)
Keyphrases
</>
data analysis
data collection
data sets
data sources
statistical analysis
synthetic data
high quality
data quality
knowledge discovery
image data
small number
raw data
neural network
xml documents
data processing