)-optimal policy for the online selection of a monotone subsequence from a random sample.
Alessandro ArlottoYehua WeiXinchang XiePublished in: Random Struct. Algorithms (2018)
Keyphrases
- optimal policy
- random sample
- markov decision processes
- finite horizon
- state space
- reinforcement learning
- infinite horizon
- random sampling
- dynamic programming
- state dependent
- long run
- sample size
- sufficient conditions
- markov decision process
- average reward
- multistage
- lost sales
- boolean functions
- upper bound
- version space
- machine learning