Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval.
Zichen YuanQi ShenBingyi ZhengYuting LiuLinying JiangGuibing GuoPublished in: CoRR (2023)
Keyphrases
- cross modal
- visual data
- original data
- image retrieval
- visual similarity
- multimedia
- multiple modalities
- multi modal
- high dimensional data
- video data
- multimedia retrieval
- perceptual information
- video sequences
- multimedia data
- content based retrieval
- visual concepts
- visual information
- semantic concepts
- multimedia databases
- image annotation
- data sets
- image collections
- image data
- image database
- raw data
- video content
- input data
- visual recognition
- video analysis
- input image
- image sequences
- support vector
- high dimensional
- web images
- video streams
- video retrieval
- visual features
- visual content
- multimedia documents
- image search
- pattern recognition
- relevance feedback
- information retrieval systems
- multimedia information retrieval
- database