Real-time sequentially decision for optimal action using prediction of the state-action pair.
Masashi SugimotoKentarou KurashigePublished in: MHS (2014)
Keyphrases
- state action
- average reward
- evaluation function
- reinforcement learning
- action space
- stochastic games
- optimal policy
- state transitions
- markov decision process
- belief state
- long run
- markov decision processes
- decision making
- optimal solution
- reward function
- optimal control
- dynamic programming
- decision problems
- function approximators
- real valued
- markov chain