Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design.
Andrew WagenmakerKevin JamiesonPublished in: CoRR (2022)
Keyphrases
- optimal policy
- markov decision processes
- finite horizon
- reinforcement learning
- state space
- markov decision process
- decision problems
- dynamic programming
- infinite horizon
- average reward
- average cost
- long run
- finite state
- policy iteration
- state dependent
- multistage
- initial state
- sufficient conditions
- markov decision problems
- machine learning
- function approximation
- linear programming
- decision processes
- search algorithm
- control policies