Optimizing Exploration-Exploitation Trade-off in Continuous Action Spaces via Q-ensemble.
Wei XueHaihong ZhangXueyu WeiTao TaoXue LiPublished in: PRICAI (3) (2022)
Keyphrases
- action space
- state space
- markov decision processes
- continuous action
- real valued
- reinforcement learning
- control policies
- state and action spaces
- continuous state spaces
- continuous state
- action selection
- stochastic processes
- skill learning
- markov decision process
- single agent
- learning algorithm
- markov decision problems
- data mining
- state variables
- supervised learning
- probability distribution