Stacked calibration of off-policy policy evaluation for video game matchmaking.
Eric Thibodeau-LauferRaul Chandias FerrariLi YaoOlivier DelalleauYoshua BengioPublished in: CIG (2013)
Keyphrases
- video games
- policy evaluation
- least squares
- temporal difference
- monte carlo
- reinforcement learning
- model free
- markov decision processes
- policy iteration
- matrix inversion
- learning experience
- function approximation
- game play
- variance reduction
- educational games
- semi parametric
- computer games
- game design
- game playing
- optimal policy
- evaluation function
- step size
- reinforcement learning algorithms
- partially observable markov decision processes
- e learning
- sufficient conditions
- decision problems