Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors.
Dmitri A. DolgovEdmund H. DurfeePublished in: IJCAI (2005)
Keyphrases
- markov decision processes
- stationary policies
- reinforcement learning
- optimal policy
- reward function
- average cost
- markov decision process
- state space
- finite horizon
- machine learning
- total cost
- control policies
- total reward
- fully observable
- markov decision problems
- average reward
- infinite horizon
- finite number
- optimal control
- non stationary
- sufficient conditions