TA2V: Text-Audio Guided Video Generation.
Minglu ZhaoWenmin WangTongbao ChenRui ZhangRuochen LiPublished in: IEEE Trans. Multim. (2024)
Keyphrases
- audio visual
- audio features
- visual data
- multimedia
- multi modal
- audio content
- text generation
- sports video
- video search
- visual information
- video segments
- video data
- multimedia documents
- video content
- natural language descriptions
- multiple modalities
- video sequences
- video database
- natural language generation
- video frames
- video content analysis
- video clips
- news video
- text detection
- video streams
- digital video
- video collections
- information retrieval
- text mining
- video retrieval
- multimedia data
- video analysis
- video material
- soccer video
- low level
- scene change detection
- media streams
- video files
- audio stream
- text graphics
- real time
- multimedia processing
- story segmentation
- digital audio
- multimedia information
- image sequences
- cross media
- content based video retrieval
- textual descriptions
- audio video
- signal processing
- audio signals
- text to speech
- event detection
- broadcast news
- key frames