Markov decision processes with time-varying discount factors and random horizon.
Rocio Ilhuicatzi-RoldánHugo Cruz-SuárezSelene Chávez-RodríguezPublished in: Kybernetika (2017)
Keyphrases
- markov decision processes
- state space
- dynamic programming
- reinforcement learning
- optimal policy
- finite state
- policy iteration
- discount factor
- reachability analysis
- transition matrices
- partially observable
- average reward
- infinite horizon
- reinforcement learning algorithms
- finite horizon
- average cost
- planning under uncertainty
- risk sensitive
- markov decision process
- model based reinforcement learning
- factored mdps
- decision theoretic planning
- state and action spaces
- discounted reward
- action sets
- stationary policies
- data mining