Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity.
Santiago PascualChunghsin YehIoannis TsiamasJoan SerràPublished in: CoRR (2024)
Keyphrases
- multimedia
- audio video
- scene change detection
- multimedia processing
- digital video
- video data
- visual data
- video content analysis
- video content
- multimedia information
- video files
- digital audio
- video material
- video sequences
- audio files
- video analysis
- video frames
- audio content
- audio stream
- audio features
- video streams
- media streams
- real time
- audio signals
- video database
- content based video retrieval
- broadcast news
- video recordings
- online video
- video copy detection
- generative model
- video search
- video segmentation
- human actions
- multimedia data
- visual information
- action recognition
- low level
- soccer video
- video signals
- video clips
- video indexing and retrieval
- video retrieval
- video surveillance
- multimodal fusion
- closed captions