Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning.
Chengxing JiaChenxiao GaoHao YinFuxiang ZhangXiong-Hui ChenTian XuLei YuanZongzhang ZhangZhi-Hua ZhouYang YuPublished in: ICLR (2024)
Keyphrases
- optimal policy
- reinforcement learning
- policy search
- markov decision process
- control policies
- reward function
- state space
- markov decision processes
- policy gradient methods
- control policy
- markov decision problems
- decision problems
- supervised learning
- finite horizon
- partially observable markov decision processes
- dynamic programming
- reinforcement learning algorithms
- total reward
- state dependent
- approximate policy iteration
- policy iteration algorithm
- continuous state
- semi markov decision process
- long run
- policy iteration
- average reward
- partially observable environments
- management policies
- transport systems
- function approximation
- finite state
- infinite horizon
- partially observable
- action selection
- function approximators
- allocation policy
- multi agent
- selective perception
- policy evaluation
- allocation policies
- state and action spaces
- multi agent reinforcement learning
- revenue management
- sufficient conditions
- optimal control
- model free
- temporal difference