MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration.
Thomas HayesSongyang ZhangXi YinGuan PangSasha ShengHarry YangSongwei GeQiyuan HuDevi ParikhPublished in: CoRR (2022)
Keyphrases
- multimedia
- story segmentation
- news video
- audio content
- audio video
- audio visual
- video content analysis
- multiple modalities
- multimodal fusion
- multimodal information
- video data
- scene change detection
- text generation
- broadcast news
- text graphics
- multimedia processing
- information retrieval
- visual data
- digital video
- natural language descriptions
- video search
- closed captions
- content based video retrieval
- multi modal
- video streams
- text detection
- media streams
- audio visual content
- audio features
- video sequences
- video retrieval
- video collections
- video database
- cross modal
- video content
- multimedia data
- multimedia documents
- video material
- video files
- visual information
- audio stream
- video analysis
- text mining
- video segments
- text documents
- video frames
- multimedia databases
- spoken documents
- video clips
- multimedia content
- keywords
- lecture videos
- news stories
- audio files
- multimedia information
- text to speech
- video scene
- single modality
- digital audio
- textual descriptions
- multimodal interfaces
- online video