Predictable Interval MDPs through Entropy Regularization.
Menno van ZutphenGiannis DelimpaltadakisMaurice HeemelsDuarte AntunesPublished in: CoRR (2024)
Keyphrases
- markov decision processes
- grey relation
- reinforcement learning
- state space
- information theory
- mutual information
- optimal policy
- information theoretic
- factored mdps
- regularization parameter
- average cost
- finite horizon
- information geometry
- multiple attribute decision making
- average reward
- markov decision process
- information entropy
- prior information
- dynamic programming
- policy iteration
- semi markov decision processes
- state and action spaces
- factored markov decision processes
- shannon entropy
- markov decision problems
- initial state
- partially observable
- sufficient conditions