Online Reinforcement Learning with Uncertain Episode Lengths.

Debmalya Mandal Goran Radanovic Jiarui Gan Adish Singla Rupak Majumdar

Published in: AAAI (2023)

Keyphrases

reinforcement learning
online learning
decision making
function approximation
dynamic programming
state space
optimal policy
markov decision processes
reinforcement learning algorithms
balancing exploration and exploitation
neural network
information retrieval
email
learning problems