Estimating Optimal Policy Value in General Linear Contextual Bandits.

Jonathan N. Lee Weihao Kong Aldo Pacchiano Vidya Muthukumar Emma Brunskill

Published in: CoRR (2023)

Keyphrases

optimal policy
finite horizon
markov decision processes
state space
special case
infinite horizon
long run
multistage
reinforcement learning
decision problems
state dependent
dynamic programming
average cost
average reward
control policies
utility function
sufficient conditions
lost sales
bayesian reinforcement learning
serial inventory systems