Login / Signup
Discovering a set of policies for the worst case reward.
Tom Zahavy
André Barreto
Daniel J. Mankowitz
Shaobo Hou
Brendan O'Donoghue
Iurii Kemaev
Satinder Baveja Singh
Published in:
CoRR (2021)
Keyphrases
</>
worst case
small number
np hard
neural network
information systems
reinforcement learning
error bounds
reward function