Performance of NPG in Countable State-Space Average-Cost RL.
Yashaswini MurthyIsaac GrosofSiva Theja MaguluriR. SrikantPublished in: CoRR (2024)
Keyphrases
- average cost
- state space
- markov decision processes
- optimal policy
- reinforcement learning
- infinite horizon
- reinforcement learning algorithms
- finite state
- action space
- dynamic programming
- long run
- control policy
- finite horizon
- markov decision process
- approximate dynamic programming
- policy iteration
- decision problems
- markov decision chains
- markov chain
- initial state
- state variables
- average reward
- finite number
- action sets
- heuristic search
- partially observable
- optimal control
- inventory models
- markov decision problems
- planning problems
- multistage
- dynamical systems
- control policies
- risk sensitive
- function approximation
- continuous state spaces
- reward function
- model free
- learning agent
- learning algorithm
- lower bound
- bayesian networks