Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions.
Yihan DuYuko KurokiWei ChenPublished in: CoRR (2021)
Keyphrases
- reward function
- markov decision processes
- reinforcement learning
- state space
- inverse reinforcement learning
- multiple agents
- reinforcement learning algorithms
- optimal policy
- partially observable
- policy search
- state variables
- simple examples
- hierarchical reinforcement learning
- generative model
- initially unknown
- machine learning
- markov decision process
- state action
- control policies
- transition model