On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies.
Haozhi WangQing WangYunfeng ShaoDong LiJianye HaoYinchuan LiPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- optimal policy
- stochastic approximation
- policy search
- control policies
- state space
- markov decision process
- theoretical framework
- reward function
- supervised learning
- multi agent
- e learning
- machine learning
- function approximation
- adaptive learning
- meta level
- partially observable markov decision processes
- learning process
- learning algorithm