Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems.
Ekaterina I. TolstayaAlec KoppelEthan StumpAlejandro RibeiroPublished in: ACC (2018)
Keyphrases
- markov decision problems
- continuous state spaces
- state space
- reinforcement learning
- action space
- optimal policy
- linear programming
- stochastic shortest path
- partially observable
- policy iteration
- decision theoretic
- markov decision processes
- decision processes
- utility function
- infinite horizon
- dynamic programming
- objective function
- transition probabilities
- function approximation
- queueing networks
- reinforcement learning algorithms
- cost function
- stochastic processes
- expected utility
- learning algorithm
- decision problems
- linear program
- average cost
- temporal difference
- decision theory
- state variables
- function approximators
- multi agent
- multistage