VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding.
Muhammad MaazHanoona Abdul RasheedSalman KhanFahad Shahbaz KhanPublished in: CoRR (2024)
Keyphrases
- video sequences
- video data
- video streams
- video images
- video content
- image features
- input image
- multimedia
- visual data
- real time
- image content
- single image
- key frames
- image frames
- multiscale
- image segmentation
- visual cues
- compressed video
- video frames
- pre trained
- video files
- image analysis
- image retrieval
- image classification
- video retrieval
- object motion
- low level
- real time video
- camera movement
- images and video sequences
- weakly labeled
- image regions
- image representation
- segmentation method
- spatial information
- image processing
- image sequences
- static images
- quality metrics
- image data
- edge detection
- video analysis
- dynamic scenes
- video clips
- image collections
- super resolution
- feature points