Reinforcement learning with state-dependent discount factor.
Naoto YoshidaEiji UchibeKenji DoyaPublished in: ICDL-EPIROB (2013)
Keyphrases
- state dependent
- optimal policy
- reinforcement learning
- markov decision processes
- markov decision problems
- state space
- average reward
- decision problems
- dynamic programming
- finite state
- multistage
- finite horizon
- long run
- infinite horizon
- model free
- function approximation
- markov decision process
- reinforcement learning algorithms
- average cost
- partially observable
- asymptotically optimal
- steady state
- policy iteration
- temporal difference
- inventory level
- initial state
- partially observable markov decision processes
- sufficient conditions
- multi agent
- reward function
- lost sales