Login / Signup

Risk-averse trees for learning from logged bandit feedback.

Francesco TrovòStefano PaladinoPaolo SimoneMarcello RestelliNicola Gatti
Published in: IJCNN (2017)
Keyphrases
  • learning algorithm
  • risk averse
  • reinforcement learning
  • multi objective
  • steady state
  • active learning
  • probability distribution
  • supply chain
  • optimization methods
  • decision theoretic