Finite-Memory Strategies in POMDPs with Long-Run Average Objectives.
Krishnendu ChatterjeeRaimundo SaonaBruno ZiliottoPublished in: Math. Oper. Res. (2022)
Keyphrases
- long run
- average cost
- short run
- optimal policy
- infinite horizon
- average reward
- expected cost
- markov decision processes
- reinforcement learning
- stationary policies
- finite number
- partially observable markov decision processes
- partially observable
- heavy traffic
- queueing networks
- decision problems
- state space
- dynamic programming
- exchange rate
- markov decision problems
- belief state
- optimal strategy
- finite state
- long term
- search algorithm
- data mining
- control policy
- holding cost
- non stationary
- customer classes