DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies.
Soroush NasirianyVitchyr H. PongAshvin NairAlexander KhazatskyGlen BersethSergey LevinePublished in: ICRA (2021)
Keyphrases
- reinforcement learning
- general purpose
- optimal policy
- policy search
- control policies
- state space
- markov decision processes
- markov decision process
- reinforcement learning algorithms
- function approximation
- control policy
- model free
- reward function
- markov decision problems
- special purpose
- hierarchical reinforcement learning
- reinforcement learning agents
- multi agent
- fitted q iteration
- partially observable markov decision processes
- total reward
- long run
- control problems
- action space
- temporal difference learning
- semi markov decision process
- direct policy search
- temporal difference
- policy iteration
- approximate policy iteration
- autonomous learning
- multiagent reinforcement learning
- partially observable domains
- temporal difference methods
- learning algorithm
- action selection
- decision problems
- dynamic programming
- multi agent reinforcement learning
- continuous state
- learning agent
- state and action spaces
- finite state
- probability distribution
- learning process
- machine learning
- tabula rasa