A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning.
Andrew PattersonAdam WhiteSina GhiassianMartha WhitePublished in: CoRR (2021)
Keyphrases
- estimated parameters
- reinforcement learning
- multi agent reinforcement learning
- state space
- piecewise linear
- model free
- reinforcement learning algorithms
- temporal difference learning
- multi agent
- dynamic programming
- optimal policy
- markov decision processes
- linear program
- function approximation
- search space
- learning process
- learning tasks
- learning algorithm