Discriminator Soft Actor Critic without Extrinsic Rewards.
Daichi NishioToi TsunedaDaiki KuyoshiSatoshi YamanePublished in: GCCE (2020)
Keyphrases
- actor critic
- reinforcement learning
- reinforcement learning algorithms
- markov decision processes
- temporal difference
- approximate dynamic programming
- policy iteration
- policy gradient
- function approximation
- state space
- optimal control
- gradient method
- neuro fuzzy
- reward function
- machine learning
- optimal policy
- average reward
- dynamic programming
- control policy
- model free
- learning problems
- supervised learning
- belief state
- linear programming
- multi agent