Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards.

Joshua R. Bertram Xuxi Yang Peng Wei

Published in: CoRR (2018)

Keyphrases

markov decision processes
reinforcement learning
reward function
fully observable
online learning
state space
machine learning
finite state
optimal policy
neural network
sparse coding
high dimensional
reinforcement learning algorithms
average cost
finite horizon
dynamic programming
decision problems
lower bound
markov decision problems
real time
multiarmed bandit