Login / Signup
Constrained Policy Optimization for Controlled Contextual Bandit Exploration.
Mohammad Kachuee
Sungjin Lee
Published in:
AISafety@IJCAI (2022)
Keyphrases
</>
contextual bandit
concave convex procedure
upper confidence bound
optimization problems
optimization method
constrained optimization
action selection
lagrange multipliers
data mining
social media