Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning.
Anton BakhtinDavid J. WuAdam LererJonathan GrayAthul Paul JacobGabriele FarinaAlexander H. MillerNoam BrownPublished in: CoRR (2022)
Keyphrases
- nash equilibrium
- reinforcement learning
- game theory
- game theoretic
- stochastic games
- least squares
- action selection
- function approximation
- partially observable
- deterministic domains
- blocks world
- game playing
- planning problems
- heuristic search
- multi agent
- human subjects
- optimal control
- government agencies
- markov decision processes
- model free
- domain independent
- complex domains
- e government
- temporal difference learning
- partial observability
- human players
- case based planning
- learning algorithm