Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations.
Guojian WangFaguo WuXiao ZhangTianyuan ChenPublished in: CoRR (2024)
Keyphrases
- reward function
- expected reward
- reinforcement learning
- optimal policy
- markov decision processes
- total reward
- control policy
- bandit problems
- average reward
- optimization algorithm
- state space
- partially observable environments
- discounted reward
- optimization problems
- inverse reinforcement learning
- optimization methods
- partially observable
- sparse pca
- bayes risk
- reinforcement learning algorithms
- joint optimization
- multiple agents
- transition probabilities
- state action
- policy gradient
- genetic algorithm
- infinite horizon
- optimization process
- sparse coding
- optimization method
- decision problems
- long term and short term