Login / Signup

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers.

Rya SanovarSrikant BharadwajRenée St. AmantVictor RühleSaravan Rajmohan
Published in: CoRR (2024)
Keyphrases