Best Policy Identification in Linear MDPs.
Jérôme TaupinYassir JedraAlexandre ProutièrePublished in: Allerton (2023)
Keyphrases
- optimal policy
- markov decision processes
- markov decision process
- markov decision problems
- reinforcement learning
- policy iteration
- average reward
- finite horizon
- reward function
- state space
- policy search
- partially observable
- action space
- dynamic programming
- state and action spaces
- decision processes
- average cost
- reinforcement learning algorithms
- linear systems
- infinite horizon
- initial state
- state dependent
- control policies
- continuous state spaces
- reinforcement learning problems
- factored mdps
- linear model
- semi markov decision processes