UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection.
Ye LiuSiyuan LiYang WuChang Wen ChenYing ShanXiaohu QiePublished in: CoRR (2022)
Keyphrases
- multi modal
- video search
- cut detection
- semantic concepts
- cross modal
- video indexing
- video data
- multi modality
- event detection
- audio visual
- multimedia documents
- information retrieval systems
- video database
- video sequences
- multimedia
- video frames
- image annotation
- video content
- video analysis
- video streams
- image search
- video shots
- news video
- uni modal
- fusing multiple
- multiple modalities
- visual data
- multimedia retrieval
- multimedia databases
- relevance feedback
- high level