Login / Signup
Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation.
Liyuan Xu
Heishiro Kanagawa
Arthur Gretton
Published in:
NeurIPS (2021)
Keyphrases
</>
learning algorithm
reinforcement learning
objective function
least squares
bayesian networks
td learning
dynamic programming
markov chain
learning tasks
statistical learning
model free