Login / Signup
Risk-averse trees for learning from logged bandit feedback.
Francesco Trovò
Stefano Paladino
Paolo Simone
Marcello Restelli
Nicola Gatti
Published in:
IJCNN (2017)
Keyphrases
</>
learning algorithm
risk averse
reinforcement learning
multi objective
steady state
active learning
probability distribution
supply chain
optimization methods
decision theoretic