Q-Learning for Continuous State and Action MDPs under Average Cost Criteria.
Ali Devran KaraSerdar YükselPublished in: CoRR (2023)
Keyphrases
- average cost
- continuous state
- markov decision processes
- action space
- state action
- optimal policy
- continuous state and action spaces
- finite state
- reinforcement learning
- state space
- average reward
- continuous state spaces
- markov decision process
- reinforcement learning algorithms
- policy iteration
- continuous action
- long run
- initial state
- finite horizon
- state dependent
- control policies
- stochastic games
- infinite horizon
- action sets
- reward function
- policy search
- partially observable markov decision processes
- action selection
- dynamic programming
- decision problems
- approximate dynamic programming
- control policy
- markov chain
- heuristic search
- optimal control
- finite number
- model checking
- markov decision problems
- multistage
- function approximators
- planning problems
- search space
- search algorithm
- special case
- partially observable
- linear programming
- dynamical systems
- decision making