Login / Signup
InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms.
Haipeng Chen
Yan Jiao
Zhiwei (Tony) Qin
Xiaocheng Tang
Hao Li
Bo An
Hongtu Zhu
Jieping Ye
Published in:
ICDM (2019)
Keyphrases
</>
td learning
contextual bandit
temporal difference
evaluation function
upper confidence bound
function approximation
reinforcement learning
reinforcement learning algorithms
model free
policy evaluation
machine learning
support vector
monte carlo
multi step
news recommendation