Policy Learning of MDPs with Mixed Continuous/Discrete Variables: A Case Study on Model-Free Control of Markovian Jump Systems.
Joao Paulo Jansch-PortoBin HuGeir E. DullerudPublished in: L4DC (2020)
Keyphrases
- reinforcement learning
- model free
- discrete variables
- policy iteration
- action selection
- learning tasks
- machine learning
- rl algorithms
- average reward
- learning problems
- learning algorithm
- markov decision processes
- temporal difference
- active learning
- action space
- prior knowledge
- neural network
- policy evaluation
- optimal policy
- supervised learning
- state space