Improving Exploration in Actor-Critic With Weakly Pessimistic Value Estimation and Optimistic Policy Optimization.

Published in: IEEE Trans. Neural Networks Learn. Syst. (2024)

Keyphrases