Mask Atari for Deep Reinforcement Learning as POMDP Benchmarks.
Yang ShaoQuan KongTadayuki MatsumuraTaiki FujiKiyoto ItoHiroyuki MizunoPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- linear value function approximation
- markov decision problems
- reinforcement learning algorithms
- function approximation
- continuous state
- partially observable
- state space
- reinforcement learning problems
- hidden state
- markov decision processes
- partially observable markov decision processes
- optimal policy
- control problems
- model free reinforcement learning
- model free
- markov games
- reinforcement learning methods
- learning algorithm
- multi agent
- optimal control
- policy evaluation
- linear programming
- partially observable markov decision process
- machine learning
- reward function
- action selection
- transfer learning
- supervised learning
- dynamic programming
- cooperative