A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning.
Francisco M. GarciaPhilip S. ThomasPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- exploration strategy
- markov decision processes
- markov decision process
- model based reinforcement learning
- optimal policy
- state space
- action selection
- exploration exploitation
- action sets
- state and action spaces
- function approximation
- active exploration
- markov decision problems
- partially observable
- st century
- reward function
- exploration exploitation tradeoff
- learning algorithm
- model free
- dynamic programming
- reinforcement learning algorithms
- planning under uncertainty
- machine learning
- lifelong learning
- learning process
- total reward
- optimal control
- finite state
- unknown environments
- temporal difference
- approximate dynamic programming
- supervised learning
- action space
- policy iteration
- state abstraction
- transition model
- reinforcement learning problems
- multi agent
- partially observable markov decision processes
- bayesian reinforcement learning