LTLf/LDLf Non-Markovian Rewards.
Ronen I. BrafmanGiuseppe De GiacomoFabio PatriziPublished in: AAAI (2018)
Keyphrases
- reinforcement learning
- reward function
- markov decision processes
- decision processes
- situation calculus
- state space
- multiarmed bandit
- stochastic process
- optimal policy
- bandit problems
- learning algorithm
- multi agent
- reinforcement learning algorithms
- neural network
- reinforcement learning agents
- free riding
- state variables
- transfer learning
- machine learning
- transition probabilities
- stochastic processes
- dynamic programming
- information systems
- real time
- multi armed bandits
- long term and short term