Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward.
Washim Uddin MondalVaneet AggarwalPublished in: Trans. Mach. Learn. Res. (2023)
Keyphrases
- reinforcement learning
- state space
- function approximation
- eligibility traces
- model free
- reinforcement learning algorithms
- temporal difference
- partially observable environments
- reward function
- optimal policy
- dynamic programming
- multi agent
- learning algorithm
- learning problems
- reward shaping
- action selection
- peer to peer
- machine learning
- average reward
- reinforcement learning methods
- policy gradient
- multi agent reinforcement learning
- partially observable
- data sets
- total reward
- markov decision process
- sufficient conditions
- supervised learning
- markov decision processes