Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences.
Chayoung KimPublished in: Symmetry (2020)
Keyphrases
- temporal difference
- monte carlo
- reinforcement learning
- td learning
- policy evaluation
- temporal difference learning
- reinforcement learning algorithms
- monte carlo methods
- importance sampling
- markov chain
- policy iteration
- temporal difference methods
- monte carlo tree search
- function approximation
- evaluation function
- actor critic
- particle filter
- model free
- function approximators
- variance reduction
- stochastic approximation
- radial basis function
- learning experience
- reinforcement learning problems
- state space
- search space