MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration.
Thomas HayesSongyang ZhangXi YinGuan PangSasha ShengHarry YangSongwei GeQiyuan HuDevi ParikhPublished in: ECCV (8) (2022)
Keyphrases
- multimedia
- story segmentation
- news video
- audio content
- video content analysis
- multiple modalities
- audio video
- audio visual
- text generation
- video data
- multimodal fusion
- multimodal information
- text graphics
- multi modal
- digital video
- scene change detection
- visual data
- multimedia processing
- audio features
- video content
- video search
- broadcast news
- multimedia information
- video analysis
- content based video retrieval
- video database
- lecture videos
- video files
- natural language descriptions
- video scene
- audio files
- closed captions
- video sequences
- text documents
- soccer video
- text mining
- video frames
- video retrieval
- signal processing
- visual information
- multimedia databases
- multimedia data
- video collections
- text detection
- cross modal
- video segments
- audio stream
- text data
- audio visual content
- information retrieval
- digital audio
- event detection
- online video
- keywords
- audio signals
- text to speech
- video on demand
- video signals
- video clips