Bounding the Optimal Value Function in Compositional Reinforcement Learning.
Jacob AdamczykVolodymyr MakarenkoArgenis ArriojasStas TiomkinRahul V. KulkarniPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- control policy
- optimal control
- dynamic programming
- state space
- upper bound
- piecewise linear
- transfer learning
- lp norm
- optimality criterion
- single parameter
- temporal difference
- action selection
- optimal design
- markov decision processes
- database
- worst case
- active learning
- search space
- decision making
- learning algorithm