Contextual Online Imitation Learning (COIL): Using Guide Policies in Reinforcement Learning.
Alexander HillMarc GroefsemaMatthia SabatelliRaffaella CarloniMarco GrzegorczykPublished in: ICAART (3) (2024)
Keyphrases
- imitation learning
- reinforcement learning
- optimal policy
- markov decision process
- reward function
- state space
- reinforcement learning methods
- function approximation
- partially observable markov decision processes
- machine learning
- reinforcement learning algorithms
- dynamic programming
- multi agent
- temporal difference
- learning algorithm
- control problems
- learning process