GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection.
Haozhan ShenTiancheng ZhaoMingwei ZhuJianwei YinPublished in: CoRR (2023)
Keyphrases
- object detection
- object categories
- computer vision
- visual categorization
- visual perception
- training examples
- discriminatively trained
- vision system
- visual vocabulary
- human vision
- training set
- object detectors
- programming language
- category level
- face detection
- scene understanding
- natural language
- visual features
- low level
- visual field
- visual words
- language learning
- boosted classifiers
- visual processing
- scene recognition
- visual scene
- pedestrian detection
- object class
- image processing
- object classes
- visual information
- visual input
- closed world
- multi class
- object recognition
- training process
- visual concepts
- machine learning
- background subtraction
- human body
- training samples
- supervised learning