Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning.
Rohit GirdharMannat SinghAndrew BrownQuentin DuvalSamaneh AzadiSai Saketh RambhatlaAkbar ShahXi YinDevi ParikhIshan MisraPublished in: CoRR (2023)
Keyphrases
- key frames
- video streams
- video images
- multimedia
- video data
- video sequences
- static images
- video frames
- image data
- image content
- video segments
- visual data
- video analysis
- semantic labels
- video content
- pre trained
- input image
- image retrieval
- image frames
- textual descriptions
- space time
- natural language descriptions
- single image
- images and video sequences
- multiscale
- image classification
- temporal continuity
- video clips
- image representation
- high resolution
- video search
- video files
- image features
- video scene
- edge detection
- semantic information
- news video
- computer vision
- caption text
- object motion
- video shots
- semantic concepts
- video retrieval
- video surveillance
- image segmentation
- keywords
- segmentation algorithm
- feature points
- low level
- image analysis