Beyond Cumulative Returns via Reinforcement Learning over State-Action Occupancy Measures.
Junyu ZhangAmrit Singh BediMengdi WangAlec KoppelPublished in: ACC (2021)
Keyphrases
- state action
- reinforcement learning
- evaluation function
- action space
- continuous state
- markov decision process
- function approximators
- stochastic games
- average reward
- function approximation
- state space
- learning algorithm
- model free
- markov decision processes
- action selection
- state transitions
- policy gradient
- learning automata
- belief state
- reward function
- policy iteration
- learning capabilities
- reinforcement learning algorithms
- feature space