Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs.
Beatris Escobedo-TrujilloCarmen G. Higuera-ChanPublished in: Kybernetika (2019)
Keyphrases
- markov decision processes
- state action
- average reward
- action space
- markov decision process
- reinforcement learning
- average cost
- stochastic games
- reward function
- state space
- policy iteration
- finite state
- optimal policy
- dynamic programming
- finite horizon
- reinforcement learning algorithms
- evaluation function
- infinite horizon
- long run
- learning algorithm
- partially observable
- policy gradient
- model free