GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values.
Farnoosh JavadiWalid AhmedHabib HajimolahoseiniFoozhan AtaiefardMohammad HassanpourSaina AsaniAustin WenOmar Mohamed AwadKangling LiuYang LiuPublished in: CoRR (2023)