VideoCon: Robust Video-Language Alignment via Contrast Captions.
Hritik BansalYonatan BittonIdan SzpektorKai-Wei ChangAditya GroverPublished in: CoRR (2023)
Keyphrases
- video content
- language learning
- video streams
- foreground background segmentation
- video sequences
- multimedia
- video data
- video frames
- natural language
- multimedia data
- programming language
- face recognition
- computer vision
- video processing
- partial occlusion
- event detection
- temporal information
- spatial and temporal
- high level
- video retrieval
- video images
- news video
- word level
- image classification