What Effects the Generalization in Visual Reinforcement Learning: Policy Consistency with Truncated Return Prediction.
Shuo WangZhihao WuXiaobo HuJinwen WangYoufang LinKai LvPublished in: AAAI (2024)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- prediction accuracy
- markov decision process
- action selection
- prediction model
- approximate dynamic programming
- prediction error
- function approximation
- action space
- state space
- visual features
- markov decision processes
- control policies
- reinforcement learning problems
- prediction algorithm
- state and action spaces
- policy iteration
- markov decision problems
- state action
- control policy
- policy gradient
- visual information
- partially observable domains
- policy evaluation
- actor critic
- partially observable
- average reward
- model free
- low level
- multi agent
- function approximators
- learning algorithm
- continuous state
- reward function
- transfer learning
- supervised learning
- dynamic programming
- high level