Connecting Vision and Language with Video Localized Narratives.
Paul VoigtlaenderSoravit ChangpinyoJordi Pont-TusetRadu SoricutVittorio FerrariPublished in: CoRR (2023)
Keyphrases
- real time
- video data
- real time video
- programming language
- video content
- multimedia
- video streams
- language learning
- video sequences
- computer vision
- vision system
- natural language
- video analysis
- specification language
- space time
- video clips
- video database
- knowledge base
- data sets
- video search
- video processing
- visual perception
- language processing
- video segmentation
- key frames
- multimedia data
- spatial and temporal
- image processing
- multi agent
- object oriented
- low level