Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning.
Luofeng LiaoZuyue FuZhuoran YangMladen KolarZhaoran WangPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- markov decision process
- policy iteration
- dynamic programming
- reinforcement learning algorithms
- partially observable markov decision processes
- heuristic search
- real time
- learning algorithm
- function approximation
- model free
- average reward
- causal inference
- infinite horizon
- finite state
- markov decision chains
- action space
- causal reasoning
- multi agent
- bayesian networks
- function approximators
- causal models
- average cost
- action selection
- long run
- causal relationships
- optimal control
- decision problems
- transfer learning
- markov chain
- machine learning
- data mining