Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection.
Matteo PapiniAndrea TirinzoniAldo PacchianoMarcello RestelliAlessandro LazaricMatteo PirottaPublished in: NeurIPS (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- reward function
- state space
- optimal policy
- decision theoretic planning
- state and action spaces
- partially observable
- function approximation
- machine learning
- policy search
- dynamic programming
- finite state
- loss function
- average reward
- total reward
- decision diagrams
- function approximators
- policy iteration
- control problems
- e learning
- long run
- factored mdps
- action sets
- learning algorithm