Everything at Once - Multi-modal Fusion Transformer for Video Retrieval.
Nina ShvetsovaBrian ChenAndrew RouditchenkoSamuel ThomasBrian KingsburyRogério FerisDavid HarwathJames R. GlassHilde KuehnePublished in: CVPR (2022)
Keyphrases
- video retrieval
- multi modal fusion
- video database
- semantic gap
- visual content
- video indexing
- video search
- key frames
- content based retrieval
- concept detection
- video data
- image and video retrieval
- video content
- retrieval systems
- video collections
- content based video retrieval
- video clips
- video shots
- facial features
- interactive retrieval
- concept based video retrieval
- information retrieval systems
- object recognition
- training data
- machine learning
- multi modal
- feature vectors
- similarity measure
- e learning