RLCFR: Minimize counterfactual regret by deep reinforcement learning.
Huale LiXuan WangFengwei JiaYulin WuJiajia ZhangShuhan QiPublished in: Expert Syst. Appl. (2022)
Keyphrases
- reinforcement learning
- online learning
- function approximation
- reward function
- total reward
- expert advice
- state space
- markov decision processes
- confidence bounds
- robotic control
- optimal policy
- loss function
- reinforcement learning algorithms
- binary classification
- multi armed bandit
- minimax regret
- model free
- data sets
- worst case
- learning algorithm
- machine learning
- action selection
- partially observable
- game theory
- multi agent reinforcement learning
- transfer learning
- lower bound
- regret minimization