VideoXum: Cross-modal Visual and Textural Summarization of Videos.
Jingyang LinHang HuaMing ChenYikang LiJenhao HsiaoChiuman HoJiebo LuoPublished in: CoRR (2023)
Keyphrases
- cross modal
- multi modal
- visual data
- video search
- multimedia retrieval
- visual recognition
- video sequences
- perceptual information
- image retrieval
- multimedia databases
- video frames
- visual similarity
- human activities
- video analysis
- video data
- video content
- semantic concepts
- visual information
- multimedia data
- human actions
- natural language processing
- information retrieval