Handling stochastic reward delays in machine reinforcement learning.
Jeffrey S. CampbellSidney Nascimento GivigiHoward M. SchwartzPublished in: CCECE (2015)
Keyphrases
- reinforcement learning
- direct policy search
- learning automata
- function approximation
- control policies
- state space
- stochastic approximation
- timed petri nets
- eligibility traces
- reinforcement learning algorithms
- temporal difference
- model free
- learning algorithm
- multi agent
- optimal policy
- machine learning
- dynamic programming
- monte carlo
- learning problems
- reward shaping
- multi armed bandit
- stochastic processes
- learning agent
- supervised learning
- reinforcement learning methods
- control policy
- continuous state spaces
- batch processing
- control problems
- markov decision processes
- reward function
- temporal difference learning
- state transition
- state dependent
- partially observable
- stochastic model
- total reward
- learning classifier systems