Q-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control.
Dejan V. DjoninVikram KrishnamurthyPublished in: IEEE Trans. Signal Process. (2007)
Keyphrases
- markov decision processes
- optimal policy
- reinforcement learning
- decentralized control
- learning algorithm
- reinforcement learning algorithms
- finite state
- dynamic programming
- policy iteration
- markov decision process
- control policies
- decision processes
- reward function
- total reward
- finite horizon
- state space
- partially observable
- average cost
- macro actions
- multistage
- machine learning
- decision theoretic planning
- discounted reward
- transition matrices
- average reward
- control system
- action sets
- infinite horizon
- policy evaluation
- action space
- partially observable markov decision processes
- expected reward
- control problems
- learning tasks
- semi markov decision processes