Connecting Vision and Language with Video Localized Narratives.
Paul VoigtlaenderSoravit ChangpinyoJordi Pont-TusetRadu SoricutVittorio FerrariPublished in: CVPR (2023)
Keyphrases
- real time
- video data
- video content
- video sequences
- multimedia
- programming language
- natural language
- vision system
- video frames
- video clips
- language learning
- video streams
- online video
- language processing
- digital video
- video analysis
- computer vision
- video retrieval
- space time
- spatio temporal
- human vision
- key frames
- video database
- video search
- machine learning