A Robust Algorithm to Unifying Offline Causal Inference and Online Multi-armed Bandit Learning.

Qiao Tang Hong Xie

Published in: ICDM (2021)

Keyphrases

learning algorithm
online learning
causal inference
learning process
multi armed bandit
objective function
probabilistic model
reinforcement learning
optimal solution
worst case
maximum entropy