Transformers learn to implement preconditioned gradient descent for in-context learning.
Kwangjun AhnXiang ChengHadi DaneshmandSuvrit SraPublished in: NeurIPS (2023)
Keyphrases
- supervised learning
- learning tasks
- learning rules
- learning systems
- learning problems
- learning process
- cost function
- online learning
- knowledge acquisition
- efficient learning
- learning algorithm
- contextual information
- iterative learning
- unsupervised manner
- context sensitive
- incremental learning
- context aware
- active learning
- reinforcement learning