Optimistic and Pessimistic Actor in RL: Decoupling Exploration and Utilization.
Jingpu YangQirui ZhaoHelin WangYuxiao HuangZirui SongMiao FangPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- exploration strategy
- autonomous learning
- action selection
- exploration exploitation
- input output
- markov decision processes
- exploration exploitation tradeoff
- multi agent
- neural network
- reinforcement learning algorithms
- learning algorithm
- real valued
- bandit problems
- multiple robots
- state space
- information retrieval
- machine learning