Inferring bounds on the performance of a control policy from a sample of trajectories.
Raphaël FonteneauSusan A. MurphyLouis WehenkelDamien ErnstPublished in: ADPRL (2009)
Keyphrases
- control policy
- long run
- reinforcement learning
- approximate dynamic programming
- admission control
- control policies
- batch mode
- lower bound
- upper bound
- moving object trajectories
- upper and lower bounds
- error bounds
- lower and upper bounds
- average cost
- trajectory data
- worst case
- moving objects
- collision free
- real time
- model checking
- sample size
- computational complexity
- multi agent
- machine learning