Active Learning of Markov Decision Processes using Baum-Welch algorithm (Extended).
Giovanni BacciAnna IngólfsdóttirKim G. LarsenRaphaël ReynouardPublished in: CoRR (2021)
Keyphrases
- markov decision processes
- dynamic programming
- learning algorithm
- objective function
- active learning
- np hard
- model based reinforcement learning
- finite state
- expectation maximization
- state space
- k means
- optimal solution
- policy iteration
- average reward
- hidden markov models
- search space
- memory efficient
- reinforcement learning