Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards.
Joshua R. BertramXuxi YangPeng WeiPublished in: CoRR (2018)
Keyphrases
- markov decision processes
- reinforcement learning
- reward function
- fully observable
- online learning
- state space
- machine learning
- finite state
- optimal policy
- neural network
- sparse coding
- high dimensional
- reinforcement learning algorithms
- average cost
- finite horizon
- dynamic programming
- decision problems
- lower bound
- markov decision problems
- real time
- multiarmed bandit