Login / Signup
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition.
Tiancheng Jin
Haipeng Luo
Published in:
CoRR (2019)
Keyphrases
</>
reinforcement learning
learning process
state space
supervised learning
learning systems
active learning
dynamic programming
learning algorithm
online learning
learning analytics
prior knowledge
optimal policy
continuous state spaces