An exact solution in Markov decision process with multiplicative rewards as a general framework.

Yuan Yao Xiaolin Sun

Published in: CoRR (2020)

Keyphrases

exact solution
markov decision process
markov decision processes
reinforcement learning
reward function
state space
optimal policy
lower bound
finite horizon
column generation
exact algorithms
approximate solutions
finite state
infinite horizon
policy iteration
partial observability
dynamic programming
reinforcement learning algorithms
partially observable
machine learning
multi agent
optimal solution
stationary policies
dynamical systems
initial state
special case
np hard
multiple agents
upper bound
linear programming