On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP.
Tianhao WuYunchang YangSimon S. DuLiwei WangPublished in: ICML (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- optimal policy
- markov decision process
- multi agent
- state space
- reward function
- action sets
- reinforcement learning algorithms
- state and action spaces
- function approximation
- partially observable
- markov decision problems
- continuous state
- dynamic programming
- learning algorithm
- reinforcement learning methods
- policy iteration
- bayesian reinforcement learning
- neural network
- model free
- block size
- temporal difference
- supervised learning
- policy search
- decision problems
- action space
- machine learning
- learning process
- temporal difference learning
- long run
- average reward
- linear program
- linear programming
- decision makers
- dynamic programming algorithms
- learning problems
- initial state
- partially observable markov decision processes
- total reward