HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues.
Ankit JhaDebabrata PalMainak SinghaNaman AgarwalBiplab BanerjeePublished in: CoRR (2023)
Keyphrases
- audio visual
- pattern recognition
- multi modal
- multimodal fusion
- temporal segmentation
- image classification
- visual data
- visual information
- temporal context
- sports video
- feature set
- feature space
- feature selection
- video summarization
- emotion recognition
- face recognition
- visual features
- feature extraction
- multimedia
- metadata
- low dimensional
- data sets
- co occurrence
- image data
- video sequences
- multi stream
- machine learning