Login / Signup
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models.
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
Published in:
CoRR (2024)
Keyphrases
</>
language model
class imbalance
heavy tailed
class distribution
active learning
cost sensitive
generalized gaussian
probabilistic model
n gram
feature selection
high dimensionality
information retrieval
mixture model
concept drift
prior distribution
supervised learning
pattern recognition
multi class
high dimensional