Predictive Estimation for Reinforcement Learning with Time-Varying Reward Functions.
Abolfazl HashemiAntesh UpadhyayPublished in: ACSSC (2023)
Keyphrases
- reward function
- reinforcement learning
- reinforcement learning algorithms
- policy search
- markov decision processes
- optimal policy
- state space
- markov decision process
- inverse reinforcement learning
- partially observable
- multiple agents
- temporal difference
- model free
- simple examples
- function approximation
- machine learning
- transition probabilities
- markov chain
- dynamic programming
- multi agent
- state action
- transition model
- data mining
- particle filter
- learning agent
- learning algorithm