Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions.
Junqiu WangJianmei TanPeng LinChenguang XingBo LiuPublished in: ROBIO (2023)
Keyphrases
- actor critic
- learning algorithm
- reinforcement learning
- inverse reinforcement learning
- policy gradient
- policy search
- k means
- reward function
- dynamic programming
- markov decision processes
- np hard
- cost function
- temporal difference
- average reward
- state action
- search space
- objective function
- policy gradient methods
- probabilistic model
- reinforcement learning algorithms
- hidden markov models