VidLA: Video-Language Alignment at Scale.
Mamshad Nayeem RizveFan FeiJayakrishnan UnnikrishnanSon TranBenjamin Z. YaoBelinda ZengMubarak ShahTrishul ChilimbiPublished in: CoRR (2024)
Keyphrases
- video data
- video sequences
- programming language
- digital video
- video streams
- multimedia
- key frames
- video content
- language learning
- video clips
- video analysis
- real time
- space time
- spatial and temporal
- multimedia data
- natural language
- image alignment
- video database
- neural network
- similarity measure
- event detection
- motion estimation
- feature vectors
- video shots
- object oriented programming
- image sequences
- specification language