EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.
Shraman PramanickYale SongSayan NagKevin Qinghong LinHardik ShahMike Zheng ShouRama ChellappaPengchuan ZhangPublished in: ICCV (2023)
Keyphrases
- video data
- video sequences
- video content
- real time
- data fusion
- video clips
- language learning
- video streams
- activity recognition
- visual saliency
- multimedia
- video summarization
- online video
- video frames
- information fusion
- video analysis
- video database
- training phase
- digital video
- training set
- computer vision
- video shots
- multi modal fusion
- temporal information
- human activities
- spatial and temporal
- space time
- online learning
- active learning
- face recognition