V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.
Hang HuaYunlong TangChenliang XuJiebo LuoPublished in: CoRR (2024)
Keyphrases
- cross modal
- video summarization
- multi modal
- audio visual
- visual data
- video content
- video data
- multimedia retrieval
- event detection
- key frames
- surveillance videos
- low level features
- image retrieval
- multimedia
- spatio temporal
- video sequences
- visual similarity
- multimedia databases
- video retrieval
- feature vectors
- video frames
- high level
- semantic information
- contextual information