Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting.
Ziping XuAmbuj TewariPublished in: NeurIPS (2020)
Keyphrases
- factored mdps
- reinforcement learning
- regret bounds
- multi armed bandit
- markov decision processes
- approximate dynamic programming
- state space
- lower bound
- upper bound
- policy iteration
- markov decision problems
- context specific
- reinforcement learning algorithms
- optimal policy
- online learning
- learning algorithm
- transition model
- model free
- function approximation
- dynamic programming
- temporal difference
- linear regression
- finite state
- linear program
- machine learning
- partially observable
- basis functions
- action space
- fixed point
- dynamical systems
- least squares