Non-delusional Q-learning and value-iteration.
Tyler LuDale SchuurmansCraig BoutilierPublished in: NeurIPS (2018)
Keyphrases
- stochastic shortest path
- state space
- markov decision processes
- policy iteration
- optimal policy
- reinforcement learning
- reinforcement learning algorithms
- dynamic programming
- heuristic search
- model free
- function approximation
- state action
- markov decision process
- cooperative
- finite state
- markov decision problems
- discount factor
- decision problems
- multi agent
- infinite horizon
- stochastic approximation
- average reward
- learning rate
- continuous state spaces
- belief space
- factored mdps
- markov decision chains
- partially observable
- belief state
- decision theoretic
- markov chain
- learning algorithm
- neural network
- temporal difference learning
- action selection
- long run
- multi agent reinforcement learning
- dynamical systems
- data sets