Dual-Policy-Guided Offline Reinforcement Learning with Optimal Stopping.
Weibo JiangShaohui LiZhi LiYuxin KeZhizhuo JiangYaowen LiYu LiuPublished in: AAMAS (2024)
Keyphrases
- optimal stopping
- finite horizon
- optimal policy
- reinforcement learning
- markov decision process
- markov decision processes
- infinite horizon
- state space
- decision problems
- dynamic programming
- long run
- finite state
- multistage
- state dependent
- reward function
- brownian motion
- optimal control
- average cost
- asymptotically optimal
- sufficient conditions
- partially observable
- markov chain
- learning algorithm
- machine learning
- partially observable markov decision processes