Login / Signup
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs.
Dirk van der Hoeven
Lukas Zierahn
Tal Lancewicki
Aviv Rosenberg
Nicolò Cesa-Bianchi
Published in:
CoRR (2023)
Keyphrases
</>
quantitative analysis
statistical analysis
stochastic systems
reinforcement learning
case study
objective function
state space
least squares
sufficient conditions
markov decision processes
multi armed bandits