Deconfounding Reinforcement Learning in Observational Settings.
Chaochao LuBernhard SchölkopfJosé Miguel Hernández-LobatoPublished in: CoRR (2018)
Keyphrases
- reinforcement learning
- function approximation
- state space
- model free
- control problems
- multi agent
- learning problems
- direct policy search
- reinforcement learning algorithms
- optimal policy
- supervised learning
- markov decision processes
- machine learning
- reward function
- learning algorithm
- temporal difference
- learning process
- artificial intelligence
- temporal difference learning
- reinforcement learning methods
- causal inference