Login / Signup

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness.

Liangliang CaoBowen ZhangChen ChenYinfei YangXianzhi DuWencong ZhangZhiyun LuYantao Zheng
Published in: CoRR (2023)
Keyphrases
  • text detection
  • text regions
  • supervised learning
  • training examples
  • video clips
  • training set
  • face detection