Login / Signup
VIMI: Grounding Video Generation through Multi-modal Instruction.
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
Published in:
CoRR (2024)
Keyphrases
</>
multi modal
semantic concepts
video search
multimedia
video sequences
video data
video content
multiple modalities
high dimensional
cross modal
multi modality
video streams
audio visual
uni modal
multimedia data
key frames
video frames
humanoid robot
video database
video shots
feature space