Login / Signup
The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning.
Harm van Seijen
Hadi Nekoei
Evan Racah
Sarath Chandar
Published in:
NeurIPS (2020)
Keyphrases
</>
reinforcement learning
model free
online learning
reward function
lower bound
expert advice
markov decision processes
learning algorithm
human behavior
metric space
loss function
optimal policy
state space
multi agent
machine learning
distance measure
semi supervised
learning process
pairwise