Interplanetary transfers via deep representations of the optimal policy and/or of the value function.
Dario IzzoEkin ÖztürkMarcus MärtensPublished in: GECCO (Companion) (2019)
Keyphrases
- optimal policy
- state space
- reinforcement learning
- markov decision processes
- decision problems
- infinite horizon
- finite horizon
- dynamic programming
- state dependent
- long run
- multistage
- average reward
- finite state
- serial inventory systems
- markov decision process
- bayesian reinforcement learning
- reward function
- sample path
- average cost
- policy iteration
- sufficient conditions
- control policies
- markov decision problems
- partially observable markov decision processes
- production planning
- markov chain