Second Order Value Iteration in Reinforcement Learning.
Chandramouli KamanchiRaghuram Bharadwaj DiddigiShalabh BhatnagarPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- markov decision process
- policy iteration
- reinforcement learning algorithms
- function approximation
- dynamic programming
- partially observable markov decision processes
- average reward
- higher order
- learning algorithm
- heuristic search
- partially observable
- model free
- finite state
- action space
- belief space
- reward function
- least squares
- multi agent
- multi agent reinforcement learning
- high order
- temporal difference
- search algorithm
- hessian matrix
- reinforcement learning methods
- fourth order
- data sets
- policy search
- average cost
- infinite horizon
- learning problems
- transfer learning
- multiscale
- genetic algorithm
- machine learning