Login / Signup

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention.

Arvind MahankaliTatsunori B. HashimotoTengyu Ma
Published in: CoRR (2023)
Keyphrases