Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version).
Ronen I. BrafmanGiuseppe De GiacomoFabio PatriziPublished in: CoRR (2017)
Keyphrases
- preliminary version
- reinforcement learning
- markov decision processes
- reward function
- state and action spaces
- decision processes
- multiarmed bandit
- state space
- markov decision process
- reinforcement learning algorithms
- action space
- finite number
- finite state
- expressive power
- optimal policy
- learning algorithm
- dynamic programming
- factored mdps
- markov decision problems
- multiple agents
- partially observable
- average reward
- deductive databases
- policy iteration
- finite horizon
- model free
- discounted reward
- stochastic process
- stationary policies
- decision diagrams
- multi agent
- decision problems
- semi markov decision processes
- abstract machine
- reinforcement learning agents
- transition probabilities
- decision theoretic planning
- partially observable markov decision process
- planning under uncertainty