Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff.
Jian QianHaichen HuDavid Simchi-LeviPublished in: CoRR (2024)
Keyphrases
- efficient learning
- exploration exploitation tradeoff
- reinforcement learning
- markov decision processes
- function approximation
- state space
- learning algorithm
- relevance feedback
- objective function
- optimal policy
- membership queries
- pattern languages
- markov decision problems
- temporal difference
- reinforcement learning algorithms
- markov decision process
- data mining
- active learning
- policy evaluation
- average reward
- decision processes
- policy iteration
- model free
- database
- supervised learning
- learning process
- dynamic programming