CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction.

Published in: ICLR (2024)

Keyphrases