Randomised Procedures for Initialising and Switching Actions in Policy Iteration.
Shivaram KalyanakrishnanNeeldhara MisraAditya GopalanPublished in: AAAI (2016)
Keyphrases
- policy iteration
- markov decision processes
- fixed point
- model free
- optimal policy
- partially observable
- reinforcement learning
- sample path
- action space
- average reward
- markov decision process
- finite state
- infinite horizon
- least squares
- temporal difference
- state and action spaces
- reward function
- linear programming
- policy evaluation
- action selection
- dynamic programming
- convergence rate
- reinforcement learning algorithms
- markov decision problems
- decision theoretic
- decision processes
- situation calculus
- function approximation
- monte carlo
- average cost
- initial state
- multiple agents
- machine learning