Login / Signup
Optimistic Policy Optimization with Bandit Feedback.
Lior Shani
Yonathan Efroni
Aviv Rosenberg
Shie Mannor
Published in:
ICML (2020)
Keyphrases
</>
optimization algorithm
neural network
discrete optimization
real time
global optimization
optimization methods
direct search
database systems
search algorithm
dynamic programming
markov chain
sufficient conditions
optimal policy
optimization method
optimization process
feedback mechanisms