Login / Signup

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs.

Dirk van der HoevenLukas ZierahnTal LancewickiAviv RosenbergNicolò Cesa-Bianchi
Published in: CoRR (2023)
Keyphrases
  • quantitative analysis
  • statistical analysis
  • stochastic systems
  • reinforcement learning
  • case study
  • objective function
  • state space
  • least squares
  • sufficient conditions
  • markov decision processes
  • multi armed bandits