Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs.
Dan QiaoMing YinYu-Xiang WangPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- markov decision processes
- average cost
- state space
- function approximators
- optimal policy
- function approximation
- reinforcement learning algorithms
- markov decision problems
- markov decision process
- control problems
- learning algorithm
- state and action spaces
- model free
- optimal control
- partially observable
- continuous state and action spaces
- policy iteration
- linear space
- total cost
- action sets
- dynamic programming
- policy search
- cost sensitive
- model based reinforcement learning
- factored markov decision processes
- square loss
- factored mdps
- continuous state
- machine learning
- finite horizon
- worst case
- initial state
- sufficient conditions
- infinite horizon