Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions.
Jürgen SchmidhuberPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- reward function
- perceptual aliasing
- action selection
- partially observable
- action space
- markov decision processes
- state action
- state space
- learning agent
- state and action spaces
- predicting future
- reinforcement learning algorithms
- partially observable domains
- initially unknown
- model free
- maximum a posteriori
- policy search
- optimal policy
- reward signal
- neural network
- partial observability
- learned knowledge
- multi agent
- control policy
- learning algorithm
- machine learning
- behavioural cloning
- decision theoretic
- learning process
- dynamic programming
- function approximation
- markov decision process
- multiple agents
- temporal difference
- plan recognition
- learning capabilities
- multiagent reinforcement learning
- agent learns
- optimal control
- complex domains
- total reward
- autonomous agents
- average reward
- initial state