VATMAN : Video-Audio-Text Multimodal Abstractive Summarization with Trimodal Hierarchical Multi-head Attention.
Doosan BaekJiho KimHongchul LeePublished in: ICTC (2023)
Keyphrases
- multimedia
- story segmentation
- audio visual
- video summarization
- soccer video
- news video
- video search
- audio content
- audio video
- video content analysis
- sports video
- closed captions
- video data
- broadcast news
- multiple modalities
- topic segmentation
- multi modal
- multimodal information
- multimodal fusion
- video content
- digital video
- text graphics
- content based video retrieval
- scene change detection
- video streams
- video analysis
- multimedia processing
- visual data
- text summarization
- real time
- video browsing
- video sequences
- video frames
- natural language descriptions
- automatic summarization
- video segments
- audio features
- cross modal
- video files
- extractive summarization
- video scene
- multimedia data
- text detection
- multimedia information
- news stories
- visual information
- video retrieval
- automatic text summarization
- audio files
- video material
- digital audio
- lecture videos
- video annotation
- audio stream
- information retrieval
- spoken documents
- video summaries
- video clips
- visual features
- lexical chains
- multi stream
- text mining
- keywords
- text to speech
- video signals
- multi document summarization
- surveillance videos
- video database
- visual content
- visual attention