AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract).
Kevin DuIan GempYi WuYingying WuPublished in: AAAI (2023)
Keyphrases
- markov decision process
- policy iteration
- np hard
- initial state
- markov decision processes
- finite state
- optimal policy
- state space
- reinforcement learning
- infinite horizon
- sample path
- finite horizon
- temporal difference learning
- linear programming
- average reward
- special case
- transition matrices
- computational complexity
- decision problems
- markov games
- optimal solution
- factored mdps
- model free
- bayesian networks
- average cost
- markov chain
- dynamic programming
- fixed point
- markov decision problems
- least squares
- temporal difference
- transition probabilities
- situation calculus
- probability distribution
- reinforcement learning algorithms
- reward function
- learning algorithm