Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets.
Miroslav StruplFrancesco FaccioDylan R. AshleyJürgen SchmidhuberRupesh Kumar SrivastavaPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- direct policy search
- stochastic approximation
- learning automata
- dynamic environments
- function approximation
- monte carlo
- optimal control
- real world
- stochastic optimization
- learning process
- multi agent reinforcement learning
- stochastic model
- control problems
- action selection
- control policies
- markov decision processes
- machine learning