On Pursuit of Designing Multi-modal Transformer for Video Grounding.
Meng CaoLong ChenMike Zheng ShouCan ZhangYuexian ZouPublished in: CoRR (2021)
Keyphrases
- multi modal
- video search
- semantic concepts
- audio visual
- multimedia
- multi modality
- video content
- video sequences
- video data
- video frames
- multiple modalities
- cross modal
- video clips
- visual data
- video streams
- high dimensional
- video database
- video analysis
- image processing
- video retrieval
- video shots
- fault diagnosis
- fusing multiple
- humanoid robot
- key frames
- spatial and temporal