Temporal Bilinear Encoding Network of Audio-visual Features at Low Sampling Rates.
Feiyan HuEva MohedanoNoel E. O'ConnorKevin McGuinnessPublished in: VISIGRAPP (5: VISAPP) (2021)
Keyphrases
- visual features
- visual information
- sampling rate
- visual data
- image classification
- audio features
- visual content
- low level
- low level features
- image annotation
- image retrieval
- semantic concepts
- temporal information
- acoustic features
- semantic gap
- keywords
- image search
- image collections
- visual appearance
- bridge the semantic gap
- key frames
- multimedia
- audio visual
- bag of features
- visual similarity
- multi modal
- content based video retrieval