Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting.
Syed Talal WasimMuzammal NaseerSalman H. KhanFahad Shahbaz KhanMubarak ShahPublished in: CoRR (2023)
Keyphrases
- video clips
- video segments
- video database
- video data
- key frames
- video content
- video streams
- video frames
- multiple modalities
- low level features
- video retrieval
- multimedia documents
- video collections
- video sequences
- multi modal
- multimodal information
- video search
- event detection
- text mining
- long video
- information retrieval
- spatial and temporal
- text documents
- multimedia
- space time
- news video
- natural language descriptions
- audio content
- video surveillance
- video shots
- visual information
- text detection
- web documents
- information extraction
- low level