Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality.
Siyu ChenHeejune SheenTianhao WangZhuoran YangPublished in: CoRR (2024)
Keyphrases
- learning process
- supervised learning
- online learning
- knowledge acquisition
- active learning
- artificial neural networks
- training process
- learning systems
- learning problems
- online training
- bayesian networks
- learning speed
- structured prediction
- dynamic model
- unsupervised learning
- prior knowledge
- training set
- multi agent
- reinforcement learning