A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning.
Fanghui HuangXinyang DengYixin HeWen JiangPublished in: Inf. Sci. (2023)
Keyphrases
- action selection
- reinforcement learning
- optimal policy
- action space
- partially observable domains
- confidence level
- exploration exploitation tradeoff
- markov decision problems
- learning algorithm
- agent receives
- computational complexity
- optimal control
- learning process
- markov decision process
- confidence measure
- association rules
- policy search
- exploration strategy
- agent learns
- state and action spaces
- reward shaping
- partially observable
- temporal difference
- state space