A Maximum Divergence Approach to Optimal Policy in Deep Reinforcement Learning.
Zhiyou YangHong QuMingsheng FuWang HuYongze ZhaoPublished in: IEEE Trans. Cybern. (2023)
Keyphrases
- optimal policy
- reinforcement learning
- markov decision processes
- decision problems
- state space
- finite horizon
- finite state
- infinite horizon
- multistage
- dynamic programming
- long run
- state dependent
- markov decision process
- average cost
- sufficient conditions
- bayesian reinforcement learning
- lost sales
- markov decision problems
- average reward
- model free
- policy iteration
- machine learning
- initial state
- temporal difference
- control policies
- multi agent
- function approximation
- total reward
- serial inventory systems