Thompson Sampling for Infinite-Horizon Discounted Decision Processes.
Daniel AdelmanCagla KeceliAlba V. Olivares-NadalPublished in: CoRR (2024)
Keyphrases
- infinite horizon
- decision processes
- markov decision processes
- optimal policy
- decision problems
- finite horizon
- stochastic demand
- state space
- dynamic programming
- long run
- production planning
- decision making
- average cost
- optimal control
- finite state
- single item
- partially observable
- policy iteration
- reinforcement learning
- markov decision process
- decision process
- reasoning process
- lead time
- partially observable markov decision processes
- average reward
- lost sales
- markov decision problems
- fixed cost