Estimating Optimal Policy Value in General Linear Contextual Bandits.
Jonathan N. LeeWeihao KongAldo PacchianoVidya MuthukumarEmma BrunskillPublished in: CoRR (2023)
Keyphrases
- optimal policy
- finite horizon
- markov decision processes
- state space
- special case
- infinite horizon
- long run
- multistage
- reinforcement learning
- decision problems
- state dependent
- dynamic programming
- average cost
- average reward
- control policies
- utility function
- sufficient conditions
- lost sales
- bayesian reinforcement learning
- serial inventory systems