A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video.
Keito KudoHaruki NagasawaJun SuzukiNobuyuki ShimizuPublished in: CoRR (2023)
Keyphrases
- object tracking
- video sequences
- key frames
- video summaries
- video summarization
- video content
- surveillance videos
- video retrieval
- video data
- video shots
- video browsing
- video database
- visual features
- audio visual
- news video
- video clips
- video frames
- image sequences
- caption text
- visual content
- video segments
- video streams
- low level features
- video search
- multi modal
- video analysis
- visual data
- visual cues
- feature vectors
- content based retrieval
- visual information
- image classification
- image features