Achieving Correlated Equilibrium by Studying Opponent's Behavior Through Policy-Based Deep Reinforcement Learning.
Kuo Chun TsaiZhu HanPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- agent receives
- markov decision processes
- action selection
- reinforcement learning problems
- markov decision process
- learning algorithm
- action space
- policy iteration
- partially observable
- state space
- function approximation
- markov decision problems
- reward function
- model free
- real robot
- infinite horizon
- human behavior
- policy evaluation
- machine learning
- temporal difference
- reinforcement learning algorithms
- partially observable markov decision processes
- control policy
- decision problems