Multi-Timescale Ensemble $Q$-Learning for Markov Decision Process Policy Optimization.
Talha BozkusUrbashi MitraPublished in: IEEE Trans. Signal Process. (2024)
Keyphrases
- markov decision process
- optimal policy
- state space
- reinforcement learning
- policy iteration
- temporal difference learning
- markov games
- state action
- markov decision processes
- finite horizon
- reward function
- infinite horizon
- decision problems
- learning algorithm
- reinforcement learning algorithms
- initial state
- hierarchical reinforcement learning
- average cost
- long run
- function approximation
- state dependent
- cooperative
- average reward
- dynamic programming
- temporal difference
- partially observable
- transition probabilities
- finite state
- state variables
- multistage
- markov chain
- model free
- linear programming
- multi agent
- random walk
- sufficient conditions