Sign in

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension.

Qiang ZhouChaohui YuShaofeng ZhangSitong WuZhibing WangFan Wang
Published in: CoRR (2023)
Keyphrases
  • multi modal
  • multi modality
  • image annotation
  • high dimensional
  • audio visual
  • state space
  • computer vision
  • semantic concepts