One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention.

Published in: CoRR (2023)

Keyphrases