A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward.

Published in: CoRR (2016)

Keyphrases