Transition-based versus state-based reward functions for MDPs with Value-at-Risk.
Shuai MaJia Yuan YuPublished in: Allerton (2017)
Keyphrases
- reward function
- transition model
- state space
- markov decision processes
- markov decision process
- transition probabilities
- state variables
- optimal policy
- reinforcement learning
- factored mdps
- reinforcement learning algorithms
- state transition
- finite state
- partially observable
- state transitions
- policy search
- multiple agents
- initial state
- markov decision problems
- markov chain
- initially unknown
- generative model
- particle filter
- action space
- inverse reinforcement learning
- search algorithm