Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow.
Wenjun ZengYi LiuPublished in: CoRR (2021)
Keyphrases
- markov decision process
- sequential decision making
- reinforcement learning
- decision problems
- optimal policy
- state space
- markov decision processes
- interactive dynamic influence diagrams
- influence diagrams
- function approximation
- infinite horizon
- model free
- temporal difference
- reinforcement learning algorithms
- initial state
- supervised learning
- multi agent
- utility function
- dynamic programming
- transition probabilities
- special case
- lower bound