Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms.
Manon FlageatBryan LimAntoine CullyPublished in: CoRR (2023)
Keyphrases
- reinforcement learning algorithms
- reinforcement learning problems
- policy search
- partially observable environments
- reinforcement learning
- reward function
- markov decision processes
- state space
- eligibility traces
- optimal policy
- markov games
- total reward
- model free
- policy gradient
- temporal difference
- reinforcement learning methods
- learning algorithm
- function approximation
- markov decision process
- policy iteration
- function approximators
- partially observable
- action selection
- policy evaluation
- inverse reinforcement learning
- dynamic environments
- infinite horizon
- multiagent reinforcement learning
- stochastic games
- markov decision problems
- artificial neural networks
- tabula rasa
- control problems
- multiple agents
- search space