A general Markov decision process formalism for action-state entropy-regularized reward maximization.
Dmytro GrytskyyJorge Ramírez-RuizRubén Moreno-BotePublished in: CoRR (2023)
Keyphrases
- markov decision process
- state action
- state space
- reinforcement learning
- initial state
- action space
- markov decision processes
- reward function
- optimal policy
- special case
- evaluation function
- finite horizon
- situation calculus
- policy iteration
- average reward
- discounted reward
- state transitions
- infinite horizon
- transition probabilities
- state variables
- partial observability
- internal state
- action selection
- machine learning
- supply chain
- wide class
- dynamic programming
- action theories