Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs.
Tianwei NiBenjamin EysenbachRuslan SalakhutdinovPublished in: ICML (2022)
Keyphrases
- model free
- reinforcement learning
- reinforcement learning algorithms
- function approximation
- state space
- rl algorithms
- average reward
- partially observable markov decision processes
- temporal difference
- policy iteration
- partially observable
- dynamic programming
- markov decision processes
- continuous state
- policy search
- learning problems
- reinforcement learning methods
- machine learning
- transfer learning
- stochastic games
- temporal difference learning
- support vector
- learning styles
- function approximators
- active learning