Combinatorial Pure Exploration with Bottleneck Reward Function.
Yihan DuYuko KurokiWei ChenPublished in: NeurIPS (2021)
Keyphrases
- maximum entropy
- reward function
- markov models
- transition probabilities
- reinforcement learning
- inverse reinforcement learning
- markov decision processes
- state space
- reinforcement learning algorithms
- optimal policy
- multiple agents
- partially observable
- conditional random fields
- transition model
- initially unknown
- hierarchical reinforcement learning
- state variables
- higher order
- markov decision process
- minimax regret