Publication: Dyna-like reinforcement learning based on accumulative and average rewards.