Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning.
Reda OuhammaDebabrota BasuOdalric-Ambrym MaillardPublished in: CoRR (2022)
Keyphrases
- exponential family
- regret bounds
- markov decision processes
- bregman divergences
- maximum likelihood
- graphical models
- density estimation
- log likelihood
- closed form
- statistical models
- mixture model
- missing values
- kl divergence
- reinforcement learning
- state space
- order statistics
- variational methods
- dynamic programming
- probability density function
- markov chain monte carlo
- lower bound
- approximate inference
- hidden variables
- hyperparameters
- probabilistic inference
- optimal policy
- upper bound