Login / Signup

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models.

David RaposoSamuel RitterBlake A. RichardsTimothy P. LillicrapPeter Conway HumphreysAdam Santoro
Published in: CoRR (2024)
Keyphrases