Login / Signup
CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning.
Yi Su
Lequn Wang
Michele Santacatterina
Thorsten Joachims
Published in:
CoRR (2018)
Keyphrases
</>
learning algorithm
least squares
reinforcement learning
td learning
temporal difference
multi agent
statistical learning
policy evaluation
markov chain