Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation.
Fengdi CheChenjun XiaoJincheng MeiBo DaiRamki GummadiOscar A RamirezChristopher K. HarrisA. Rupam MahmoodDale SchuurmansPublished in: CoRR (2024)
Keyphrases