DCAC: Reducing Unnecessary Conservatism in Offline-to-online Reinforcement Learning.
Dongxiang ChenYing WenPublished in: DAI (2023)
Keyphrases
- reinforcement learning
- real time
- online learning
- function approximation
- balancing exploration and exploitation
- machine learning
- multi agent
- optimal policy
- supervised learning
- stochastic approximation
- cross cultural
- reinforcement learning algorithms
- temporal difference
- markov decision processes
- state space
- mobile robot
- objective function
- information systems
- social networks
- data mining