Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning.
Manan TomarYonathan EfroniMohammad GhavamzadehPublished in: CoRR (2019)
Keyphrases
- model free
- multi step
- reinforcement learning
- optimal policy
- hierarchical reinforcement learning
- reinforcement learning algorithms
- function approximation
- dynamic programming
- partially observable markov decision processes
- temporal difference
- markov decision processes
- markov decision process
- reward function
- policy iteration
- markov decision problems
- state space
- rl algorithms
- knn
- learning problems
- policy evaluation
- feature selection
- average cost
- average reward
- k nearest neighbor
- partially observable
- nearest neighbor
- action space
- data mining
- least squares
- supervised learning