Optimal Policy Learning with Observational Data in Multi-Action Scenarios: Estimation, Risk Preference, and Potential Failures.
Giovanni CerulliPublished in: CoRR (2024)
Keyphrases
- optimal policy
- reinforcement learning
- average reward reinforcement learning
- decision problems
- experimental data
- markov decision processes
- learning algorithm
- learning process
- long run
- state space
- active learning
- initial state
- bayesian reinforcement learning
- decision makers
- sufficient conditions
- np hard
- sample size
- data mining
- action selection
- finite horizon
- lost sales
- causal bayesian networks