Decoupling Value and Policy for Generalization in Reinforcement Learning.
Roberta RaileanuRob FergusPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- action selection
- partially observable environments
- policy gradient
- partially observable
- function approximation
- markov decision processes
- state and action spaces
- actor critic
- policy evaluation
- control policy
- dynamic programming
- action space
- reinforcement learning problems
- approximate dynamic programming
- average reward
- input output
- state space
- control policies
- model free
- markov decision problems
- function approximators
- temporal difference
- state action
- continuous state
- reinforcement learning algorithms
- reward function
- infinite horizon
- agent learns
- partially observable domains
- multi agent
- decision making
- policy iteration
- reinforcement learning methods
- rl algorithms
- machine learning
- partially observable markov decision processes
- asymptotically optimal
- policy gradient methods
- agent receives
- learning process
- decision problems
- robotic control
- learning problems
- long run
- continuous state spaces
- finite state
- control problems
- natural actor critic
- state dependent