Login / Signup

Self-Influence Guided Data Reweighting for Language Model Pre-training.

Megh ThakkarTolga BolukbasiSriram GanapathyShikhar VashishthSarath ChandarPartha Talukdar
Published in: CoRR (2023)
Keyphrases
  • language model
  • n gram
  • language modeling
  • information retrieval
  • training data
  • training set
  • probability distribution
  • query expansion
  • mixture model
  • test collection
  • uncertain data
  • ad hoc information retrieval