Sparse randomized policies for Markov decision processes based on Tsallis divergence regularization.
Pierre LeleuxBertrand LebichotGuillaume GuexMarco SaerensPublished in: Knowl. Based Syst. (2024)
Keyphrases
- markov decision processes
- optimal policy
- markov decision process
- decision processes
- average cost
- reward function
- total reward
- decentralized control
- reinforcement learning
- finite state
- state space
- stationary policies
- discounted reward
- dynamic programming
- finite horizon
- transition matrices
- policy iteration
- macro actions
- decision problems
- expected reward
- policy iteration algorithm
- partially observable markov decision processes
- infinite horizon
- information theory
- markov decision problems
- long run
- reinforcement learning algorithms
- planning under uncertainty
- partially observable
- factored mdps
- action space
- multistage
- control policies
- decision theoretic planning
- average reward
- action sets
- reachability analysis
- initial state
- sufficient conditions
- regularization parameter
- model based reinforcement learning
- semi markov decision processes