Projected State-action Balancing Weights for Offline Reinforcement Learning.
Jiayi WangZhengling QiRaymond K. W. WongPublished in: CoRR (2021)
Keyphrases
- state action
- reinforcement learning
- evaluation function
- action space
- continuous state
- state space
- function approximators
- markov decision process
- average reward
- function approximation
- stochastic games
- state transitions
- learning algorithm
- markov decision processes
- model free
- policy gradient
- reward function
- reinforcement learning algorithms
- optimal control
- optimal policy
- dynamic programming
- machine learning
- learning automata
- action selection
- multi agent
- weight vector
- learning capabilities
- partially observable
- temporal difference
- transfer learning
- learning process