Login / Signup
Learning Optimal Policies from Observational Data.
Onur Atan
William R. Zame
Mihaela van der Schaar
Published in:
CoRR (2018)
Keyphrases
</>
optimal policy
reinforcement learning
learning process
learning algorithm
markov decision processes
average reward reinforcement learning
state space
finite state
active learning
dynamic programming
sufficient conditions
least squares
random variables
multistage
decision problems
long run
finite horizon