Login / Signup

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers.

Tanvir MahmudShentong MoYapeng TianDiana Marculescu
Published in: CoRR (2024)
Keyphrases
  • audio visual
  • multi modal
  • visual information
  • multi stream
  • multimedia
  • image database
  • visual data
  • emotion recognition
  • data sets
  • pattern recognition
  • image classification
  • audio features
  • temporal context