VindLU: A Recipe for Effective Video-and-Language Pretraining.
Feng ChengXizi WangJie LeiDavid J. CrandallMohit BansalGedas BertasiusPublished in: CoRR (2022)
Keyphrases
- video data
- video analysis
- real time
- natural language
- high resolution
- object oriented programming
- spatial and temporal
- spatio temporal
- video sequences
- multimedia
- neural network
- relational databases
- programming language
- video streams
- knowledge base
- temporal information
- language learning
- video clips
- artificial intelligence