EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.
Shraman PramanickYale SongSayan NagKevin Qinghong LinHardik ShahMike Zheng ShouRama ChellappaPengchuan ZhangPublished in: CoRR (2023)
Keyphrases
- multiresolution
- image fusion
- video sequences
- video content
- video data
- language learning
- visual saliency
- multimedia
- training process
- real time
- test set
- video streams
- video frames
- video database
- video analysis
- video clips
- data fusion
- training samples
- online learning
- programming language
- video retrieval
- video summarization
- training set
- natural language
- training phase
- online video
- fusion method
- information fusion
- multimedia data
- training examples