Process-Oriented Planning and Average-Reward Optimality.
Craig BoutilierMartin L. PutermanPublished in: IJCAI (1995)
Keyphrases
- average reward
- process oriented
- goal oriented
- optimal policy
- long run
- markov decision processes
- optimality criterion
- semi markov decision processes
- reinforcement learning
- partially observable markov decision processes
- computer supported
- policy iteration
- markov chain
- model free
- discounted reward
- state and action spaces
- heuristic search
- planning problems
- hierarchical reinforcement learning
- finite state
- collaborative learning
- initial state
- infinite horizon
- partially observable
- computer supported collaborative learning
- least squares
- dynamic programming
- search space