COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment.
Chaoya JiangHaiyang XuWei YeQinghao YeChenliang LiMing YanBin BiShikun ZhangFei HuangJi ZhangPublished in: ACM Multimedia (2023)
Keyphrases
- computer vision
- language generation
- keywords
- information retrieval
- language learning
- english language
- object segmentation
- text retrieval
- image patches
- programming language
- image processing
- natural language
- object model
- gradient orientation
- native language
- english text
- machine translation system
- decision trees
- computational linguistics
- bounding box
- object detection
- complex objects
- training set
- moving objects
- information extraction
- d objects
- natural language processing