Login / Signup
Data-Juicer: A One-Stop Data Processing System for Large Language Models.
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
Ce Ge
Dawei Gao
Yuexiang Xie
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
Published in:
CoRR (2023)
Keyphrases
</>
data processing
language model
data analysis
knowledge discovery
data sources
n gram
information retrieval
training data
statistical language models