Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward.
Washim Uddin MondalVaneet AggarwalPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- function approximation
- state space
- model free
- eligibility traces
- learning algorithm
- markov decision processes
- reward function
- reinforcement learning algorithms
- temporal difference
- dynamic programming
- total reward
- multi agent
- learning agent
- learning process
- supervised learning
- reward shaping
- learning problems
- transfer learning
- optimal policy
- peer to peer
- action selection
- single agent
- state action
- reinforcement learning methods
- robotic control