Blending Imitation and Reinforcement Learning for Robust Policy Improvement.
Xuefeng LiuTakuma YonedaRick StevensMatthew R. WalterYuxin ChenPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- state space
- markov decision process
- action selection
- partially observable
- policy search
- function approximation
- reinforcement learning problems
- dynamic programming
- approximate dynamic programming
- markov decision problems
- state action
- computationally efficient
- control policy
- function approximators
- learning process
- temporal difference
- markov decision processes
- policy iteration
- learning algorithm
- control policies
- reward function
- actor critic
- multi agent
- state and action spaces