Temporal Bilinear Encoding Network of Audio-Visual Features at Low Sampling Rates.
Feiyan HuEva MohedanoNoel E. O'ConnorKevin McGuinnessPublished in: CoRR (2020)
Keyphrases
- visual features
- visual information
- sampling rate
- visual data
- visual content
- image classification
- audio features
- low level
- low level features
- image retrieval
- image search
- image annotation
- content based video retrieval
- visual appearance
- keywords
- image collections
- semantic concepts
- semantic gap
- bag of features
- key frames
- web images
- temporal information
- visual similarity
- acoustic features
- audio visual
- visual properties
- global features
- bridge the semantic gap
- search engine
- video shots
- semantic information
- multi modal
- video sequences
- high level
- multimedia
- computer vision