Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning.
Hui WangMike PreussMichael EmmerichAske PlaatPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- function approximation
- integer programming
- state space
- markov decision processes
- reward function
- learning algorithm
- eligibility traces
- model free
- reinforcement learning algorithms
- supervised learning
- partially observable environments
- temporal difference
- transfer learning
- learning agent
- action selection
- dynamic programming
- reinforcement learning methods
- finite state
- learning problems
- markov chain
- policy iteration
- function approximators
- temporal difference learning
- policy gradient
- objective function
- data sets