Policy learning in continuous-time Markov decision processes using Gaussian Processes.
Ezio BartocciLuca BortolussiTomás BrázdilDimitrios MiliosGuido SanguinettiPublished in: Perform. Evaluation (2017)
Keyphrases
- markov decision processes
- gaussian processes
- optimal policy
- reinforcement learning
- partially observable
- learning algorithm
- preference learning
- learning process
- state space
- infinite horizon
- dynamic programming
- policy iteration
- gaussian process
- learning tasks
- active learning
- markov decision process
- multi task
- continuous state spaces
- average reward
- search algorithm
- real time dynamic programming
- finite state
- optimal control
- incremental learning
- supervised learning