Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games.

Gabriele Farina Tuomas Sandholm

Published in: AAAI (2021)

Keyphrases

model free
online learning
sequential decision making problems
reinforcement learning
reinforcement learning algorithms
function approximation
temporal difference
decision theoretic planning
e learning
average reward
policy iteration
game theory
stochastic games
multi agent
markov decision problems
nash equilibria
partially observable markov decision processes
learning algorithm
game playing
state space
nash equilibrium
active learning
machine learning
optimal policy
sufficient conditions
learning agent
dynamic programming
learning process