A Heuristic Q-Learning Architecture for Fully Exploring a World and Deriving an Optimal Policy by Model-Based Planning.
Gang ZhaoShoji TatsumiRuoying SunPublished in: ICRA (1999)
Keyphrases
- optimal policy
- dynamic programming
- average reward reinforcement learning
- dynamic programming algorithms
- state space
- reinforcement learning
- decision problems
- markov decision processes
- finite horizon
- markov decision problems
- partially observable markov decision processes
- long run
- state dependent
- initial state
- infinite horizon
- planning problems
- finite state
- multistage
- model free
- average reward
- bayesian reinforcement learning
- production planning
- search algorithm
- linear programming
- function approximation
- markov decision process
- policy iteration
- optimal solution
- control policies
- decision theoretic
- reward function
- machine learning
- heuristic search
- multi agent
- optimal control
- monte carlo
- serial inventory systems