Login / Signup
Policy Gradients for Contextual Bandits.
Feiyang Pan
Qingpeng Cai
Pingzhong Tang
Fuzhen Zhuang
Qing He
Published in:
CoRR (2018)
Keyphrases
</>
optimal policy
contextual information
stochastic systems
action selection
policy making
real time
real world
context sensitive
expected cost
markov decision process