Login / Signup
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs.
Nicolas Boizard
Kevin El Haddad
Céline Hudelot
Pierre Colombo
Published in:
CoRR (2024)
Keyphrases
</>
multiscale
data structure
lower bound
case study
cooperative
pairwise
artificial neural networks
regression analysis
logit model