On the connection between Bregman divergence and value in regularized Markov decision processes.
Brendan O'DonoghuePublished in: CoRR (2022)
Keyphrases
- markov decision processes
- bregman divergences
- cost sensitive
- information theoretic
- maximum entropy
- mahalanobis distance
- policy iteration
- finite state
- theoretical guarantees
- nearest neighbor
- learning theory
- exponential family
- loss function
- reinforcement learning
- state space
- kl divergence
- optimal policy
- dynamic programming
- special case
- nonnegative matrix factorization
- boosting algorithms
- average cost
- reinforcement learning algorithms
- infinite horizon
- data dependent
- average reward
- negative matrix factorization
- linear programming