Login / Signup
Optimal Policy for Bernoulli Bandits: Computation and Algorithm Gauge.
Sebastian Pilarski
Slawomir Pilarski
Dániel Varró
Published in:
IEEE Trans. Artif. Intell. (2021)
Keyphrases
</>
optimal policy
dynamic programming
search space
computational complexity
cost function
probabilistic model
optimal solution
np hard
markov decision processes
decision problems
dynamic programming algorithms
reinforcement learning
objective function
standard deviation
policy iteration