Beyond Expected Return: Accounting for Policy Reproducibility When Evaluating Reinforcement Learning Algorithms.
Manon FlageatBryan LimAntoine CullyPublished in: AAAI (2024)
Keyphrases
- reinforcement learning algorithms
- reinforcement learning problems
- policy search
- reinforcement learning
- partially observable environments
- markov decision processes
- reward function
- state space
- model free
- total reward
- policy gradient
- optimal policy
- eligibility traces
- reinforcement learning methods
- temporal difference
- markov games
- learning algorithm
- function approximators
- stochastic games
- function approximation
- dynamic environments
- markov decision process
- policy iteration
- action space
- markov decision problems
- policy evaluation
- action selection
- multiagent reinforcement learning
- partially observable
- inverse reinforcement learning
- multi agent
- data mining
- infinite horizon
- supervised learning
- training data
- machine learning