Reward is enough for convex MDPs.
Tom ZahavyBrendan O'DonoghueGuillaume DesjardinsSatinder SinghPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- reward function
- average reward
- state space
- optimal policy
- discounted reward
- long run
- convex hull
- convex optimization
- total reward
- piecewise linear
- expected reward
- policy iteration
- partially observable
- semi markov decision processes
- finite horizon
- function approximation
- model free
- factored mdps
- inverse reinforcement learning
- markov decision problems
- reinforcement learning algorithms
- stationary policies
- finite state
- state and action spaces
- planning under uncertainty
- action space
- convex functions
- decision diagrams
- policy search
- state action
- multiple agents
- temporal difference
- convex relaxation
- model based reinforcement learning
- real time dynamic programming