Login / Signup
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers.
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
Published in:
CoRR (2024)
Keyphrases
</>
audio visual
multi modal
visual information
multi stream
multimedia
image database
visual data
emotion recognition
data sets
pattern recognition
image classification
audio features
temporal context