What About Taking Policy as Input of Value Function: Policy-extended Value Function Approximator.
Hongyao TangZhaopeng MengJianye HaoChen ChenDaniel GravesDong LiWulong LiuYaodong YangPublished in: CoRR (2020)
Keyphrases
- function approximators
- function approximation
- reinforcement learning problems
- reinforcement learning
- policy gradient
- policy gradient methods
- optimal policy
- temporal difference
- neural network
- state action
- control policy
- markov decision problems
- action space
- temporal difference methods
- policy search
- reinforcement learning algorithms
- learning tasks
- kernel methods
- training set
- machine learning