GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection.
Haozhan ShenTiancheng ZhaoMingwei ZhuJianwei YinPublished in: AAAI (2024)
Keyphrases
- object detection
- object categories
- visual categorization
- computer vision
- visual perception
- discriminatively trained
- visual information
- training examples
- language learning
- human vision
- vision system
- multi class
- visual vocabulary
- category level
- object detectors
- object class
- real time
- scene understanding
- visual processing
- visual field
- programming language
- object recognition
- background subtraction
- low level
- training set
- natural language
- training samples
- visual features
- training process
- visual scene
- scene recognition
- image processing
- contextual cues
- boosted classifiers
- active vision
- collective intelligence
- pedestrian detection
- supervised learning
- keywords
- high level
- machine learning
- neural network