Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems.
Makoto SatoShigenobu KobayashiPublished in: ICML (2001)
Keyphrases
- average reward reinforcement learning
- markov decision problems
- optimal policy
- reinforcement learning
- state space
- markov decision processes
- decision problems
- dynamic programming
- infinite horizon
- policy iteration
- finite state
- least squares
- long run
- linear programming
- maximum likelihood
- multistage
- sufficient conditions
- decision theoretic
- loss function
- transition probabilities
- partially observable
- neural network
- average cost
- markov decision process
- decision processes
- linear program
- function approximators
- data mining