A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action.
Takashi WatanabeTakashi SakuragawaPublished in: ICMLSC (2020)
Keyphrases
- markov decision processes
- expected reward
- action space
- total reward
- decision theoretic planning
- discounted reward
- reward function
- state space
- optimal policy
- reinforcement learning
- finite state
- action sets
- partially observable
- average reward
- reinforcement learning algorithms
- dynamic programming
- factored mdps
- policy iteration
- average cost
- decision processes
- planning under uncertainty
- lower bound
- risk sensitive
- finite horizon
- markov decision process
- stochastic games
- reachability analysis
- model based reinforcement learning
- transition matrices
- bayes risk
- initial state
- infinite horizon
- action selection
- game theory
- interval estimation