Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature.
Kefan DongJiaqi YangTengyu MaPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- model free
- state space
- function approximation
- reinforcement learning algorithms
- virtual reality
- virtual world
- multiscale
- machine learning
- augmented reality
- nonlinear functions
- multi agent
- markov chain
- multi armed bandit
- curvature estimation
- fully unsupervised
- temporal difference
- learning process
- robotic control
- transition model
- multi agent reinforcement learning
- control system
- random sampling
- dynamic programming
- markov decision processes
- neural network
- least squares
- supervised learning