Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems.
Alec KoppelEkaterina I. TolstayaEthan StumpAlejandro RibeiroPublished in: CoRR (2018)
Keyphrases
- markov decision problems
- continuous state spaces
- state space
- reinforcement learning
- action space
- optimal policy
- linear programming
- stochastic shortest path
- policy iteration
- markov decision processes
- partially observable
- transition probabilities
- decision theoretic
- utility function
- dynamic programming
- expected utility
- cost function
- decision processes
- average cost
- reward function
- linear program
- queueing networks
- learning agent
- markov chain
- average reward
- infinite horizon
- state variables
- learning algorithm
- bayesian networks