Measuring Exploration in Reinforcement Learning via Optimal Transport in Policy Space.
Reabetswe M. NkhumiseDebabrota BasuTony J. PrescottAditya GilraPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- action selection
- action space
- optimal policy
- control policy
- asymptotically optimal
- dynamic programming
- markov decision processes
- control policies
- optimal control
- policy search
- function approximation
- expected cost
- active exploration
- function approximators
- total reward
- search space
- markov decision process
- finite horizon
- exploration strategy
- state space
- optimal solution
- approximate dynamic programming
- exploration exploitation tradeoff
- average reward
- model based reinforcement learning
- average cost
- worst case
- partially observable
- reinforcement learning algorithms
- state and action spaces
- machine learning
- space time
- transition model
- low dimensional
- supply chain
- multi agent
- partially observable environments
- objective function