Listen and Look: Multi-Modal Aggregation and Co-Attention Network for Video-Audio Retrieval.
Xiaoshuai HaoWanqian ZhangDayan WuFei ZhuBo LiPublished in: ICME (2022)
Keyphrases
- multi modal
- cross modal
- audio visual
- audio visual content
- video search
- multimedia information
- semantic concepts
- multimedia
- audio video
- multi modality
- single modality
- visual data
- content based video retrieval
- broadcast news
- image database
- multimedia data
- audio content
- video content
- visual information
- audio features
- multiple modalities
- video database
- audio files
- multimedia databases
- video data
- image retrieval
- media streams
- multimedia documents
- visual similarity
- multimedia retrieval
- video analysis
- image annotation
- video sequences