AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion.
Seungwoo LeeChaerin KongDonghyeon JeonNojun KwakPublished in: CoRR (2023)
Keyphrases
- text regions
- video frames
- text detection
- input image
- caption text
- video images
- document images
- video sequences
- video data
- video streams
- multimedia
- video analysis
- video files
- image features
- image data
- visual data
- digital video
- image content
- news video
- video signals
- semantic labels
- video content analysis
- multiscale
- image segmentation