DCAC: Reducing Unnecessary Conservatism in Offline-to-online Reinforcement Learning.

Dongxiang Chen Ying Wen

Published in: DAI (2023)

Keyphrases

reinforcement learning
real time
online learning
function approximation
balancing exploration and exploitation
machine learning
multi agent
optimal policy
supervised learning
stochastic approximation
cross cultural
reinforcement learning algorithms
temporal difference
markov decision processes
state space
mobile robot
objective function
information systems
social networks
data mining