Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization.
Talha BozkusUrbashi MitraPublished in: CoRR (2024)
Keyphrases
- markov decision process
- optimal policy
- state space
- reinforcement learning
- temporal difference learning
- policy iteration
- state action
- markov decision processes
- markov games
- reward function
- infinite horizon
- finite horizon
- dynamic programming
- decision problems
- learning algorithm
- reinforcement learning algorithms
- hierarchical reinforcement learning
- long run
- multistage
- initial state
- finite state
- function approximation
- action space
- average reward
- average cost
- markov decision problems
- sufficient conditions
- temporal difference
- transition probabilities
- state variables
- search space
- state dependent
- inventory level
- multi agent
- objective function
- higher order