The proposal for real-time sequential-decision for optimal action using flexible-weight coefficient based on the state-action pair.
Masashi SugimotoKentarou KurashigePublished in: CEC (2015)
Keyphrases
- state action
- average reward
- evaluation function
- reinforcement learning
- markov decision process
- stochastic games
- action space
- markov decision processes
- optimal policy
- long run
- dynamic programming
- state transitions
- optimal control
- belief state
- decision making
- machine learning
- temporal difference
- function approximators
- state space
- reward function
- function approximation
- model free
- neural network
- fixed point
- optimal solution