Under-Approximating Expected Total Rewards in POMDPs.
Alexander BorkJoost-Pieter KatoenTim QuatmannPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- markov decision processes
- total reward
- partially observable markov decision processes
- expected reward
- belief state
- optimal policy
- partially observable
- dynamic programming
- data sets
- state space
- finite state
- expected profit
- model free
- long term and short term
- free riding
- average reward
- reward function
- function approximation
- decision problems
- multi agent
- neural network