RUDDER: Return Decomposition for Delayed Rewards.

Jose A. Arjona-Medina Michael Gillhofer Michael Widrich Thomas Unterthiner Sepp Hochreiter

Published in: CoRR (2018)

Keyphrases

markov decision processes
reinforcement learning
decomposition algorithm
decomposition method
multi armed bandits
bandit problems
free riding
image decomposition
decomposition methods
three dimensional
neural network
artificial neural networks
wavelet packet
expert systems
wide range
multiscale
information retrieval
database