Self-Punishment and Reward Backfill for Deep Q-Learning.

Mohammad Reza Bonyadi Rui Wang Maryam Ziaei

Published in: IEEE Trans. Neural Networks Learn. Syst. (2023)

Keyphrases

reinforcement learning
agent receives
reward function
eligibility traces
learning agent
state space
reinforcement learning algorithms
function approximation
model free
state action
markov decision processes
learning algorithm
discounted reward
optimal policy
cooperative
multi agent
average reward
learning rate
markov decision process
temporal difference learning
td learning
machine learning
action selection
dynamic programming
function approximators
multi agent systems
bucket brigade