Online Reinforcement Learning for Periodic MDP.
Ayush AniketArpan ChattopadhyayPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- markov decision processes
- markov decision process
- optimal policy
- state space
- function approximation
- partially observable
- reward function
- state and action spaces
- online learning
- temporal difference learning
- reinforcement learning algorithms
- markov decision problems
- action sets
- machine learning
- balancing exploration and exploitation
- transition model
- model free
- temporal difference
- real time
- utility function
- supervised learning
- dynamic programming
- learning algorithm
- neural network
- action space
- batch mode
- online communities