Login / Signup
Adaptive Proximal Policy Optimization with Upper Confidence Bound.
Ziqi Zhang
Jingzehua Xu
Zifeng Zhuang
Jinxin Liu
Donglin Wang
Published in:
CoRR (2023)
Keyphrases
</>
upper confidence bound
contextual bandit
optimization algorithm
global optimization
optimization process
probabilistic model
optimization method