Simultaneous discovery of multiple alternative optimal policies by reinforcement learning.
Petar KormushevDarwin G. CaldwellPublished in: IEEE Conf. of Intelligent Systems (2012)
Keyphrases
- optimal policy
- reinforcement learning
- markov decision processes
- state space
- dynamic programming
- decision problems
- policy iteration
- multistage
- finite state
- finite horizon
- long run
- state dependent
- function approximation
- initial state
- markov decision process
- machine learning
- reinforcement learning algorithms
- control policies
- infinite horizon
- reward function
- average reward
- dynamic programming algorithms
- sufficient conditions
- partially observable markov decision processes
- computational complexity
- search algorithm
- average reward reinforcement learning