RUDDER: Return Decomposition for Delayed Rewards.

Jose A. Arjona-Medina Michael Gillhofer Michael Widrich Thomas Unterthiner Johannes Brandstetter Sepp Hochreiter

Published in: NeurIPS (2019)

Keyphrases