On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes
Bruno ScherrerPublished in: CoRR (2012)
Keyphrases
- non stationary
- infinite horizon
- markov decision processes
- optimal policy
- finite horizon
- markov decision process
- average cost
- state space
- decision problems
- reinforcement learning
- long run
- finite state
- dynamic programming
- policy iteration
- holding cost
- partially observable
- decision processes
- partially observable markov decision processes
- total reward
- single item
- control policies
- reward function
- stationary policies
- markov decision problems
- average reward
- discount factor
- discounted reward
- lost sales
- planning under uncertainty
- policy iteration algorithm
- expected reward
- state dependent
- action space
- initial state
- reinforcement learning algorithms
- sufficient conditions
- multistage
- dec pomdps
- machine learning
- continuous state
- search space