Safe Exploration by Solving Early Terminated MDP.
Hao SunZiping XuMeng FangZhenghao PengJiadong GuoBo DaiBolei ZhouPublished in: CoRR (2021)
Keyphrases
- markov decision problems
- markov decision processes
- exploration strategy
- dynamic programming algorithms
- state space
- reinforcement learning
- utility function
- markov decision process
- linear programming
- decision theoretic
- transition matrices
- data sets
- action selection
- finite state
- dynamic programming
- lower bound
- search engine