VidLA: Video-Language Alignment at Scale.

Mamshad Nayeem Rizve Fan Fei Jayakrishnan Unnikrishnan Son Tran Benjamin Z. Yao Belinda Zeng Mubarak Shah Trishul Chilimbi

Published in: CoRR (2024)

Keyphrases

video data
video sequences
programming language
digital video
video streams
multimedia
key frames
video content
language learning
video clips
video analysis
real time
space time
spatial and temporal
multimedia data
natural language
image alignment
video database
neural network
similarity measure
event detection
motion estimation
feature vectors
video shots
object oriented programming
image sequences
specification language