Under-Approximating Expected Total Rewards in POMDPs.
Alexander BorkJoost-Pieter KatoenTim QuatmannPublished in: TACAS (2) (2022)
Keyphrases
- reinforcement learning
- markov decision processes
- total reward
- expected reward
- partially observable
- partially observable markov decision processes
- optimal policy
- reward function
- belief state
- state space
- hidden markov models
- function approximation
- optimal control
- finite state
- temporal difference
- expected profit
- bandit problems
- learning algorithm