Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding.

Alizée Pace Hugo Yèche Bernhard Schölkopf Gunnar Rätsch Guy Tennenholtz

Published in: CoRR (2023)

Keyphrases

reinforcement learning
function approximation
temporal difference
state space
real time
reinforcement learning algorithms
markov decision processes
control problems
data mining
machine learning
dynamic programming
data sets
model free
optimal policy
databases
support vector
objective function
website
artificial intelligence
learning algorithm
genetic algorithm
partially observable
robot control
multi agent reinforcement learning
robotic control