Sign in

Adaptive Proximal Policy Optimization with Upper Confidence Bound.

Ziqi ZhangJingzehua XuZifeng ZhuangJinxin LiuDonglin Wang
Published in: CoRR (2023)
Keyphrases
  • upper confidence bound
  • contextual bandit
  • optimization algorithm
  • global optimization
  • optimization process
  • probabilistic model
  • optimization method