Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result.
Paul WagnerPublished in: NIPS (2013)
Keyphrases
- policy iteration
- natural actor critic
- reinforcement learning problems
- markov decision processes
- reinforcement learning
- average reward
- robot arm
- model free
- temporal difference
- least squares
- neural network
- average cost
- optimal policy
- decision making
- optimal control
- cost function
- reinforcement learning methods
- search algorithm