Login / Signup
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions.
Kai Xu
Farid Tajaddodianfar
Ben Allison
Published in:
CoRR (2024)
Keyphrases
</>
multi armed bandits
bandit problems
multi armed bandit
reward function
decision problems
reinforcement learning
optimal policy
expected reward
machine learning
similarity measure
loss function
expected utility
total reward