One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention.
Arvind MahankaliTatsunori B. HashimotoTengyu MaPublished in: CoRR (2023)
Keyphrases
- optimal linear
- worst case
- closed form
- dynamic programming
- piecewise linear
- contextual information
- learning environment
- post processing
- cost function
- semi infinite programming
- closed form solutions
- optimal solution
- computational complexity
- e learning
- learner model
- multi layer
- optimal design
- neural network
- mobile robot
- minimum error
- machine learning
- genetic algorithm
- learning algorithm
- linear model
- context dependent
- learning materials
- loss function
- search algorithm
- context aware