Login / Signup

Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs.

Nicolas BoizardKevin El HaddadCéline HudelotPierre Colombo
Published in: CoRR (2024)
Keyphrases
  • multiscale
  • data structure
  • lower bound
  • case study
  • cooperative
  • pairwise
  • artificial neural networks
  • regression analysis
  • logit model