Speedup Training Artificial Intelligence for Mahjong via Reward Variance Reduction.
Jinqiu LiShuang WuHaobo FuQiang FuEnmin ZhaoJunliang XingPublished in: CoG (2022)
Keyphrases
- variance reduction
- artificial intelligence
- gradient estimation
- policy gradient
- monte carlo
- sample size
- training process
- bias variance decomposition
- training examples
- trade off
- importance sampling
- reinforcement learning
- supervised learning
- markov random field
- test set
- learning algorithm
- dynamic systems
- naive bayes classifier
- training set
- lower bound
- quasi monte carlo