Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space.
Hengrui CaiChengchun ShiRui SongWenbin LuPublished in: CoRR (2020)
Keyphrases
- action space
- markov decision processes
- policy evaluation
- reinforcement learning
- state space
- policy iteration
- real valued
- least squares
- temporal difference
- finite state
- single agent
- dynamic programming
- action selection
- monte carlo
- reinforcement learning algorithms
- stochastic processes
- average reward
- optimal policy
- markov chain
- optimal solution