Self-Punishment and Reward Backfill for Deep Q-Learning.
Mohammad Reza BonyadiRui WangMaryam ZiaeiPublished in: IEEE Trans. Neural Networks Learn. Syst. (2023)
Keyphrases
- reinforcement learning
- agent receives
- reward function
- eligibility traces
- learning agent
- state space
- reinforcement learning algorithms
- function approximation
- model free
- state action
- markov decision processes
- learning algorithm
- discounted reward
- optimal policy
- cooperative
- multi agent
- average reward
- learning rate
- markov decision process
- temporal difference learning
- td learning
- machine learning
- action selection
- dynamic programming
- function approximators
- multi agent systems
- bucket brigade