Login / Signup
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation.
Shiqi Yang
Zhi Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
Published in:
CoRR (2024)
Keyphrases
</>
audio visual
visual information
visual data
multi modal
visual features
emotion recognition
spatio temporal
visual content
temporal context
audio visual speech recognition
low level
databases
multimedia content
multimodal fusion
multi stream
video summarization
image classification
multimedia
computer vision