Login / Signup

Provably learning a multi-head attention layer.

Sitan ChenYuanzhi Li
Published in: CoRR (2024)
Keyphrases