Ctrl-Z: Recovering from Instability in Reinforcement Learning.

Vibhavari Dasagi Jake Bruce Thierry Peynot Jürgen Leitner

Published in: CoRR (2019)

Keyphrases

reinforcement learning
function approximation
reinforcement learning algorithms
optimal policy
multi agent
learning algorithm
direct policy search
markov decision processes
state space
control problems
robotic control
model free
temporal difference learning
neural network
evolutionary algorithm
control system
dynamic programming
learning process
multi agent systems
video sequences
knowledge base
multi agent reinforcement learning