Login / Signup
Balanced Data Sampling for Language Model Training with Clustering.
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
Published in:
ACL (Findings) (2024)
Keyphrases
</>
language model
information retrieval
data points
document retrieval
context sensitive
spectral clustering
keywords
n gram
continuous data