Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing.
Jianning WuZhuqing JiangShiping WenAidong MenHaiying WangPublished in: CoRR (2021)
Keyphrases
- multimodal fusion
- audio visual
- weakly supervised
- multi modal
- visual data
- visual information
- multimedia
- topic models
- high robustness
- relation extraction
- natural language
- multimodal interfaces
- relevance feedback
- natural language processing
- human computer interaction
- named entities
- information extraction
- multimedia data
- video sequences
- high level
- data sets