On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process.
Rahul MisraRafal WisniewskiCarsten Skovmose KallesøePublished in: CoRR (2023)
Keyphrases
- markov decision process
- reinforcement learning
- state action
- temporal difference learning
- state space
- average cost
- markov decision processes
- optimal policy
- stationary policies
- semi markov decision process
- infinite horizon
- finite horizon
- markov games
- average reward
- initial state
- policy iteration
- action space
- reward function
- function approximation
- linear program
- optimal solution
- learning algorithm
- partial observability
- heuristic search
- transition probabilities
- markov chain
- reinforcement learning algorithms
- dynamic programming
- action selection
- finite state
- partially observable
- stochastic games
- probability distribution
- model free
- reward shaping