RUDDER: Return Decomposition for Delayed Rewards.
Jose A. Arjona-MedinaMichael GillhoferMichael WidrichThomas UnterthinerSepp HochreiterPublished in: CoRR (2018)
Keyphrases
- markov decision processes
- reinforcement learning
- decomposition algorithm
- decomposition method
- multi armed bandits
- bandit problems
- free riding
- image decomposition
- decomposition methods
- three dimensional
- neural network
- artificial neural networks
- wavelet packet
- expert systems
- wide range
- multiscale
- information retrieval
- database