RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning.
Huale LiXuan WangFengwei JiaYifan LiYulin WuJiajia ZhangShuhan QiPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- function approximation
- reward function
- temporal difference
- total reward
- online learning
- multi armed bandit
- loss function
- markov decision processes
- lower bound
- learning algorithm
- reinforcement learning algorithms
- expert advice
- state space
- policy search
- temporal difference learning
- optimal policy
- action selection
- optimal control
- multi agent systems
- deep learning
- learning agent
- function approximators
- minimax regret
- multi class
- dynamic programming
- confidence bounds
- multi armed bandit problems