Learning Adversarial MDPs with Bandit Feedback and Unknown Transition.

Tiancheng Jin Haipeng Luo

Published in: CoRR (2019)

Keyphrases

reinforcement learning
learning process
state space
supervised learning
learning systems
active learning
dynamic programming
learning algorithm
online learning
learning analytics
prior knowledge
optimal policy
continuous state spaces