Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate.
Fan-Ming LuoZuolin TuZefang HuangYang YuPublished in: CoRR (2024)
Keyphrases
- learning rate
- convergence rate
- reinforcement learning
- learning algorithm
- error function
- recurrent neural networks
- convergence speed
- adaptive learning rate
- search space
- rapid convergence
- convergence theorem
- high accuracy
- markov decision processes
- model free
- multilayer neural networks
- bp neural network algorithm
- delta bar delta