Recurrent Model-Free RL is a Strong Baseline for Many POMDPs.
Tianwei NiBenjamin EysenbachRuslan SalakhutdinovPublished in: CoRR (2021)
Keyphrases
- model free
- reinforcement learning
- reinforcement learning algorithms
- function approximation
- temporal difference
- partially observable markov decision processes
- rl algorithms
- policy iteration
- state space
- policy evaluation
- markov decision processes
- partially observable
- average reward
- optimal policy
- multi agent
- reinforcement learning methods
- linear combination
- dynamic programming
- continuous state
- policy search
- action selection
- learning process
- learning algorithm
- neural network