Predictive reinforcement learning in non-stationary environments using weighted mixture policy.
Hossein PourshamsaeiAmin NobakhtiPublished in: Appl. Soft Comput. (2024)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- action selection
- function approximators
- control policies
- policy gradient
- policy iteration
- control policy
- function approximation
- markov decision processes
- reinforcement learning algorithms
- reward function
- actor critic
- policy evaluation
- state and action spaces
- reinforcement learning problems
- state space
- markov decision problems
- mixture model
- model free
- approximate dynamic programming
- partially observable
- partially observable environments
- policy gradient methods
- temporal difference
- action space
- machine learning
- non stationary
- partially observable markov decision processes
- rl algorithms
- order statistics
- learning algorithm
- probabilistic model
- transition model
- expectation maximization
- decision problems
- state action
- average reward
- predictive model
- learning problems
- continuous state
- optimal control
- long run
- continuous state spaces
- inverse reinforcement learning
- dynamic programming
- agent learns
- learning process
- neural network