Login / Signup

Data-Juicer: A One-Stop Data Processing System for Large Language Models.

Daoyuan ChenYilun HuangZhijian MaHesen ChenXuchen PanCe GeDawei GaoYuexiang XieZhaoyang LiuJinyang GaoYaliang LiBolin DingJingren Zhou
Published in: SIGMOD Conference Companion (2024)
Keyphrases
  • data processing
  • language model
  • data analysis
  • probabilistic model
  • data acquisition
  • xml documents
  • query processing
  • test collection
  • language modeling