An exact solution in Markov decision process with multiplicative rewards as a general framework.
Yuan YaoXiaolin SunPublished in: CoRR (2020)
Keyphrases
- exact solution
- markov decision process
- markov decision processes
- reinforcement learning
- reward function
- state space
- optimal policy
- lower bound
- finite horizon
- column generation
- exact algorithms
- approximate solutions
- finite state
- infinite horizon
- policy iteration
- partial observability
- dynamic programming
- reinforcement learning algorithms
- partially observable
- machine learning
- multi agent
- optimal solution
- stationary policies
- dynamical systems
- initial state
- special case
- np hard
- multiple agents
- upper bound
- linear programming