The Distributional Reward Critic Architecture for Perturbed-Reward Reinforcement Learning.
Xi ChenZhihui ZhuAndrew PerraultPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- function approximation
- reinforcement learning algorithms
- policy gradient
- temporal difference
- actor critic
- eligibility traces
- reward function
- management system
- learning capabilities
- multi agent
- learning algorithm
- real time
- state space
- machine learning
- partially observable environments
- average reward
- model free
- partially observable
- software architecture
- markov decision processes
- dynamic programming
- learning process
- long run
- optimal control
- optimal policy
- policy search
- inverse reinforcement learning