On Uninformative Optimal Policies in Adaptive LQR with Unknown B-Matrix.
Ingvar ZiemannHenrik SandbergPublished in: L4DC (2021)
Keyphrases
- optimal policy
- markov decision processes
- dynamic programming
- state space
- decision problems
- finite horizon
- infinite horizon
- multistage
- finite state
- reinforcement learning
- policy iteration
- long run
- average reward
- average reward reinforcement learning
- optimal control
- sufficient conditions
- average cost
- dynamic programming algorithms
- control policies
- serial inventory systems
- bayesian reinforcement learning
- state dependent
- initial state
- partially observable markov decision processes
- monte carlo
- data mining
- computational complexity
- partially observed
- reinforcement learning algorithms
- markov chain