Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications.
Manuel SchneckenreitherPublished in: CoRR (2020)
Keyphrases
- average reward
- optimal policy
- reinforcement learning
- markov decision processes
- long run
- state space
- state and action spaces
- dynamic programming
- semi markov decision processes
- discounted reward
- decision problems
- optimality criterion
- total reward
- reinforcement learning algorithms
- finite horizon
- infinite horizon
- model free
- policy iteration
- actor critic
- sample path
- markov decision process
- initial state
- multistage
- finite state
- reward function
- policy gradient
- discount factor
- learning algorithm
- data mining
- hierarchical reinforcement learning
- markov decision problems
- partially observable markov decision processes
- partially observable
- temporal difference
- multi agent
- machine learning
- inventory level
- optimal solution
- cost function
- sufficient conditions
- average cost
- function approximation