Adaptive Step-size Policy Gradients with Average Reward Metric.
Takamitsu MatsubaraTetsuro MorimuraJun MorimotoPublished in: ACML (2010)
Keyphrases
- step size
- average reward
- optimal policy
- variable step size
- policy iteration
- temporal difference
- actor critic
- markov decision processes
- convergence rate
- gradient method
- long run
- cost function
- convergence speed
- reinforcement learning
- discounted reward
- semi markov decision processes
- policy gradient
- total reward
- model free
- optimality criterion
- state space
- state action
- state and action spaces
- markov chain
- dynamic programming
- reward function
- average cost
- partially observable markov decision processes
- infinite horizon
- finite state
- function approximation
- decision problems
- multi objective
- multiresolution