DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs.

Khaled Nakhleh I-Hong Hou

Published in: CoRR (2022)

Keyphrases

optimal policy
markov decision processes
finite horizon
state space
reinforcement learning
decision problems
finite state
dynamic programming
average cost
average reward
initial state
markov decision process
infinite horizon
markov decision problems
multistage
state dependent
policy iteration
long run average cost
long run
control policies
dynamic programming algorithms
discount factor
sufficient conditions
bayesian reinforcement learning
inventory level
total cost
function approximation
expected reward