Online Reinforcement Learning for Periodic MDP.

Ayush Aniket Arpan Chattopadhyay

Published in: CoRR (2022)

Keyphrases

reinforcement learning
markov decision processes
markov decision process
optimal policy
state space
function approximation
partially observable
reward function
state and action spaces
online learning
temporal difference learning
reinforcement learning algorithms
markov decision problems
action sets
machine learning
balancing exploration and exploitation
transition model
model free
temporal difference
real time
utility function
supervised learning
dynamic programming
learning algorithm
neural network
action space
batch mode
online communities