A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video.
Keito KudoHaruki NagasawaJun SuzukiNobuyuki ShimizuPublished in: EMNLP (2023)
Keyphrases
- key frames
- video summaries
- video summarization
- video content
- video retrieval
- visual features
- audio visual
- video browsing
- video database
- video shots
- video data
- news video
- video sequences
- caption text
- video clips
- video frames
- visual content
- video streams
- low level features
- feature vectors
- multi modal
- video segments
- event detection
- video search
- visual information
- visual cues
- quality assessment
- content based retrieval
- surveillance videos
- sports video
- image classification
- visual data
- video analysis
- low level
- multimedia