COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment.

Published in: CoRR (2023)

Keyphrases