• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data.

Alycia LeeBrando MirandaSanmi Koyejo
Published in: CoRR (2023)
Keyphrases
  • data sets
  • data analysis
  • computer vision
  • high quality
  • data points
  • training examples
  • training data
  • data sources
  • motion estimation