Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design.
Andrew WagenmakerKevin G. JamiesonPublished in: NeurIPS (2022)
Keyphrases
- optimal policy
- markov decision processes
- finite horizon
- state space
- reinforcement learning
- decision problems
- dynamic programming
- average cost
- markov decision problems
- markov decision process
- long run
- multistage
- infinite horizon
- policy iteration
- average reward
- dynamic programming algorithms
- initial state
- lost sales
- develop a mathematical model
- sufficient conditions
- state dependent
- linear programming