Login / Signup
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data.
Alycia Lee
Brando Miranda
Sanmi Koyejo
Published in:
CoRR (2023)
Keyphrases
</>
data sets
data analysis
computer vision
high quality
data points
training examples
training data
data sources
motion estimation