The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition.
Tiancheng JinLongbo HuangHaipeng LuoPublished in: CoRR (2021)
Keyphrases
- markov decision processes
- transition model
- state space
- state transition
- reinforcement learning
- monte carlo
- stochastic optimization
- multi agent
- dynamic programming
- continuous state spaces
- decision diagrams
- stochastic domains
- finite horizon
- learning automata
- episodic memory
- markov decision problems
- stochastic programming
- partially observable
- utility function
- objective function