Model-Free Preference-Based Reinforcement Learning.

Christian Wirth Johannes Fürnkranz Gerhard Neumann

Published in: AAAI (2016)

Keyphrases

model free
reinforcement learning
reinforcement learning algorithms
function approximation
temporal difference
policy iteration
rl algorithms
learning algorithm
policy evaluation
state space
neural network
multi agent
temporal difference learning
reinforcement learning methods
markov decision processes
genetic algorithm
average reward
machine learning