A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion.
François DufourAlexandre GenadotPublished in: SIAM J. Control. Optim. (2020)
Keyphrases
- total reward
- markov decision processes
- convex programming
- finite state
- stationary policies
- average reward
- optimal policy
- linear programming
- optimality criterion
- convex optimization
- state space
- reinforcement learning algorithms
- reinforcement learning
- dynamic programming
- policy iteration
- markov decision process
- decision processes
- partially observable
- average cost
- action space
- partially observable markov decision processes
- action selection
- special case
- lower bound