Decoupling Value and Policy for Generalization in Reinforcement Learning.
Roberta RaileanuRob FergusPublished in: ICML (2021)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- action selection
- partially observable environments
- reinforcement learning problems
- control policies
- reinforcement learning algorithms
- function approximators
- partially observable
- function approximation
- reward function
- state and action spaces
- control policy
- markov decision problems
- markov decision processes
- approximate dynamic programming
- input output
- policy gradient
- state space
- partially observable markov decision processes
- continuous state spaces
- policy iteration
- average reward
- learning algorithm
- policy evaluation
- action space
- state action
- dynamic programming
- policy gradient methods
- model free
- actor critic
- decision problems
- agent learns
- partially observable domains
- continuous state
- temporal difference learning
- temporal difference
- infinite horizon
- long run
- finite state
- transfer learning
- least squares
- robotic control
- model free reinforcement learning
- agent receives
- neural network
- transition model
- rl algorithms
- control problems
- average cost
- radial basis function
- learning problems
- machine learning