Model-Free Trajectory-based Policy Optimization with Monotonic Improvement.
Riad AkrourAbbas AbdolmalekiHany AbdulsamadJan PetersGerhard NeumannPublished in: J. Mach. Learn. Res. (2018)
Keyphrases
- model free
- policy iteration
- reinforcement learning
- policy evaluation
- function approximation
- temporal difference
- average reward
- reinforcement learning algorithms
- optimal policy
- constrained optimization
- rl algorithms
- neural network
- pattern recognition
- text mining
- decision trees
- action selection
- data mining
- markov decision problems