Best Policy Identification in Linear MDPs.
Jerome TaupinYassir JedraAlexandre ProutièrePublished in: CoRR (2022)
Keyphrases
- optimal policy
- markov decision processes
- markov decision process
- finite horizon
- markov decision problems
- reward function
- state space
- policy search
- infinite horizon
- average cost
- average reward
- partially observable
- reinforcement learning
- decision processes
- linear systems
- action space
- decision problems
- sufficient conditions
- finite state
- multi agent
- policy evaluation
- approximate dynamic programming
- neural network
- factored mdps
- state and action spaces