Determining the optimal temperature parameter for Softmax function in reinforcement learning.
Yu-Lin HeXiaoliang ZhangWei AoJoshua Zhexue HuangPublished in: Appl. Soft Comput. (2018)
Keyphrases
- reinforcement learning
- single parameter
- control policy
- optimal control
- function approximators
- dynamic programming
- temporal difference learning
- optimal parameters
- lp norm
- parameter space
- piecewise linear
- semi infinite programming
- machine learning
- approximate dynamic programming
- threshold values
- linear model
- parameter values
- convergence rate
- closed form
- markov decision processes
- optimal solution