Infinite Horizon Multi-armed Bandits with Reward Vectors: Exploration/Exploitation Trade-off.
Madalina M. DruganPublished in: ICAART (Revised Selected Papers) (2015)
Keyphrases
- infinite horizon
- multi armed bandits
- bandit problems
- long run
- multi armed bandit
- optimal policy
- decision problems
- finite horizon
- reinforcement learning
- optimal control
- markov decision processes
- production planning
- partially observable
- stochastic demand
- total reward
- dynamic programming
- single item
- average cost
- state space
- markov decision problems
- markov decision process
- average reward
- fixed cost
- stationary policies
- lead time
- inventory level
- search algorithm
- reward function
- control system
- partially observable markov decision processes
- queueing networks