Login / Signup
Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem.
Jules Kruijswijk
Petri Parvinen
Maurits Kaptein
Published in:
CoRR (2019)
Keyphrases
</>
policy evaluation
least squares
temporal difference
reinforcement learning
model free
matrix inversion
monte carlo
markov decision processes
function approximation
policy iteration
variance reduction
text classification
markov chain
fixed point
statistical inference