Everything at Once - Multi-modal Fusion Transformer for Video Retrieval.
Nina ShvetsovaBrian ChenAndrew RouditchenkoSamuel ThomasBrian KingsburyRogério FerisDavid HarwathJames R. GlassHilde KuehnePublished in: CoRR (2021)
Keyphrases
- video retrieval
- multi modal fusion
- visual content
- video database
- video indexing
- content based retrieval
- semantic gap
- concept detection
- video data
- video search
- facial features
- retrieval systems
- video content
- concept based video retrieval
- key frames
- video collections
- image and video retrieval
- video clips
- video shots
- face detection
- metadata
- interactive retrieval
- semantic video
- training data