LTLf/LDLf Non-Markovian Rewards.

Ronen I. Brafman Giuseppe De Giacomo Fabio Patrizi

Published in: AAAI (2018)

Keyphrases

reinforcement learning
reward function
markov decision processes
decision processes
situation calculus
state space
multiarmed bandit
stochastic process
optimal policy
bandit problems
learning algorithm
multi agent
reinforcement learning algorithms
neural network
reinforcement learning agents
free riding
state variables
transfer learning
machine learning
transition probabilities
stochastic processes
dynamic programming
information systems
real time
multi armed bandits
long term and short term