APAC: Authorized Probability-controlled Actor-Critic For Offline Reinforcement Learning.
Jing ZhangChi ZhangWenjia WangBing-Yi JingPublished in: CoRR (2023)
Keyphrases
- actor critic
- reinforcement learning
- policy gradient
- reinforcement learning algorithms
- optimal control
- temporal difference
- approximate dynamic programming
- function approximation
- neuro fuzzy
- gradient method
- state space
- policy iteration
- markov decision processes
- control problems
- average reward
- policy gradient methods
- natural actor critic
- optimal policy
- dynamic programming
- machine learning
- neural network
- probability distribution
- temporal difference learning
- multi agent
- transfer learning
- control system
- rl algorithms
- multi agent systems
- optimal solution