Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality (extended abstract).
Siyu ChenHeejune SheenTianhao WangZhuoran YangPublished in: COLT (2024)
Keyphrases
- extended abstract
- learning algorithm
- supervised learning
- learning process
- reinforcement learning
- learning speed
- feedforward neural networks
- recurrent networks
- online learning
- prior knowledge
- knowledge acquisition
- contextual information
- motor skills
- learning machines
- training process
- learning problems
- learning tasks
- learning systems
- real time
- active learning
- evolutionary algorithm
- neural network