Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks.
Reinaldo Uribe MurielFernando LozanoCharles AndersonPublished in: CoRR (2015)
Keyphrases
- semi markov decision processes
- average reward
- markov decision processes
- optimal policy
- long run
- optimality criterion
- discounted reward
- model free
- reinforcement learning
- policy iteration
- total reward
- dynamic programming
- markov chain
- decision problems
- fixed point
- state space
- partially observable
- infinite horizon
- transfer learning
- supply chain
- search algorithm