Login / Signup
Off-Policy Average Reward Actor-Critic with Deterministic Policy Search.
Naman Saxena
Subhojyoti Khastigir
Shishir Kolathaya
Shalabh Bhatnagar
Published in:
CoRR (2023)
Keyphrases
</>
policy gradient
average reward
actor critic
markov decision processes
optimal policy
reinforcement learning
long run
partially observable markov decision processes
stochastic games
model free
reinforcement learning algorithms
function approximation
markov chain
state action
policy iteration
reward function
state space
approximation methods
gradient method
optimal control
multi agent
variance reduction
belief state
neural network
single agent
dynamical systems
dynamic programming
temporal difference
infinite horizon
decision problems
learning algorithm