A Policy Gradient Method for Confounded POMDPs.
Mao HongZhengling QiYanxun XuPublished in: ICLR (2024)
Keyphrases
- gradient method
- policy gradient
- actor critic
- policy search
- convergence rate
- step size
- optimization methods
- partially observable markov decision processes
- reinforcement learning
- convex formulation
- negative matrix factorization
- partially observable
- optimal policy
- function approximation
- temporal difference
- information retrieval systems
- wavelet transform
- multiresolution
- evolutionary algorithm
- pairwise
- image segmentation
- point based value iteration