Login / Signup

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.

Joshua AinslieJames Lee-ThorpMichiel de JongYury ZemlyanskiyFederico LebrónSumit Sanghai
Published in: CoRR (2023)
Keyphrases