The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition.
Tiancheng JinLongbo HuangHaipeng LuoPublished in: NeurIPS (2021)
Keyphrases
- markov decision processes
- state transition
- state space
- transition model
- reinforcement learning
- state transitions
- multi agent
- stochastic domains
- decision diagrams
- stochastic model
- finite horizon
- stochastic optimization
- monte carlo
- dynamic programming
- neural network
- average reward
- stochastic programming
- planning under uncertainty
- markov decision problems
- reward function
- semi markov decision processes