AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization.
Longxiang HeLi ShenJunbo TanXueqian WangPublished in: CoRR (2024)
Keyphrases
- constrained optimization
- optimal policy
- action selection
- reinforcement learning
- constrained optimization problems
- constraint handling
- penalty function
- objective function
- policy iteration
- markov decision processes
- state action
- function approximation
- state space
- reward function
- unconstrained optimization
- markov decision process
- continuous state spaces
- multi agent
- inequality constraints
- augmented lagrangian
- interval analysis
- penalty functions
- reinforcement learning algorithms
- iterative methods
- learning algorithm
- function approximators
- action space
- least squares