Bellman-consistent Pessimism for Offline Reinforcement Learning.
Tengyang XieChing-An ChengNan JiangPaul MineiroAlekh AgarwalPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- function approximation
- real time
- machine learning
- state action
- temporal difference learning
- state space
- globally optimal
- learning algorithm
- multi agent
- reinforcement learning algorithms
- linear program
- model free
- piecewise linear
- least squares
- consistency constraints
- markov decision processes
- dynamic programming
- control system
- information retrieval
- neural network