Login / Signup
RUDDER: Return Decomposition for Delayed Rewards.
Jose A. Arjona-Medina
Michael Gillhofer
Michael Widrich
Thomas Unterthiner
Johannes Brandstetter
Sepp Hochreiter
Published in:
NeurIPS (2019)
Keyphrases
</>
reinforcement learning
decomposition method
image decomposition
neural network
decomposition algorithm
long term and short term
three dimensional
markov decision processes
data mining
machine learning
knowledge base
artificial neural networks
user interface
wavelet packet
bandit problems
multiarmed bandit