DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs.
Aayam Kumar ShresthaStefan LeePrasad TadepalliAlan FernPublished in: ICLR (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- markov decision problems
- sequential decision making problems
- state space
- optimal policy
- partially observable
- function approximation
- markov decision process
- semi markov decision processes
- reinforcement learning algorithms
- reinforcement learning agents
- continuous state and action spaces
- average reward
- control problems
- policy search
- model free
- factored markov decision processes
- policy iteration
- real time
- reward function
- learning algorithm
- temporal difference
- multi agent
- state and action spaces
- stochastic shortest path
- transition model
- model based reinforcement learning
- factored mdps
- linear programming
- least squares
- machine learning
- action sets
- decision theoretic
- policy evaluation
- reinforcement learning methods
- control policy
- action space