Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms.
Matthia SabatelliGilles LouppePierre GeurtsMarco A. WieringPublished in: CoRR (2019)
Keyphrases
- reinforcement learning algorithms
- reinforcement learning
- model free
- state space
- markov decision processes
- reinforcement learning problems
- learning algorithm
- eligibility traces
- function approximation
- temporal difference
- reinforcement learning methods
- partially observable environments
- dynamic programming
- stochastic games
- dynamic environments
- training data
- neural network
- objective function
- markov decision problems
- policy search