Tackling Morpion Solitaire with AlphaZero-like Ranked Reward Reinforcement Learning.
Hui WangMike PreussMichael EmmerichAske PlaatPublished in: SYNASC (2020)
Keyphrases
- reinforcement learning
- eligibility traces
- function approximation
- model free
- partially observable environments
- reinforcement learning algorithms
- integer programming
- state space
- machine learning
- learning agent
- temporal difference
- optimal policy
- dynamic programming
- learning algorithm
- average reward
- markov decision processes
- partially observable
- policy iteration
- descending order
- neural network
- temporal difference learning
- partially observable markov decision processes
- learning process
- optimal control
- objective function
- search engine
- function approximators
- multi agent
- action selection
- policy evaluation
- multi agent reinforcement learning
- state and action spaces
- learning problems