Login / Signup
Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design.
Andrew Wagenmaker
Kevin G. Jamieson
Published in:
NeurIPS (2022)
Keyphrases
</>
optimal policy
markov decision processes
finite horizon
state space
reinforcement learning
decision problems
dynamic programming
average cost
markov decision problems
markov decision process
long run
multistage
infinite horizon
policy iteration
average reward
dynamic programming algorithms
initial state
lost sales
develop a mathematical model
sufficient conditions
state dependent
linear programming