Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning.
Yonathan EfroniGal DalalBruno ScherrerShie MannorPublished in: NeurIPS (2018)
Keyphrases
- reinforcement learning
- optimal policy
- dynamic programming
- online learning
- greedy algorithm
- machine learning
- post processing
- policy search
- multi step
- search space
- search algorithm
- optimal control
- approximate policy iteration
- state space
- markov chain
- real time
- model free
- feature selection
- markov decision process
- batch mode
- learning algorithm
- reinforcement learning agents