Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor.
Julien Grand-ClémentMarek PetrikPublished in: NeurIPS (2023)
Keyphrases
- average cost
- discount factor
- markov decision processes
- average reward
- optimal policy
- discounted reward
- markov decision problems
- infinite horizon
- long run
- finite state
- stationary policies
- state space
- reinforcement learning
- finite horizon
- dynamic programming
- finite number
- linear programming
- optimal control
- policy iteration
- multistage
- initial state
- partially observable
- markov decision process
- total cost
- sufficient conditions
- markov chain
- planning under uncertainty
- decision problems
- reward function
- sample path
- linear program
- reinforcement learning algorithms
- model free