Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding.
Ruyang LiuJingjia HuangWei GaoThomas H. LiGe LiPublished in: CoRR (2023)
Keyphrases
- input image
- image analysis
- single image
- multiscale
- image retrieval
- image data
- conceptual models
- image features
- high resolution
- multimedia
- image frames
- visual cues
- bayesian framework
- special case
- feature points
- random fields
- video clips
- test images
- visual data
- image segmentation
- video content
- region of interest
- video streams
- image content
- image representation
- programming language
- low level
- video sequences
- video images
- visual effects
- static images
- pre trained
- video files
- object motion
- image collections
- spatial information
- segmentation method
- segmentation algorithm
- object recognition