Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing.
Jie FuJunyu GaoChangsheng XuPublished in: CoRR (2023)
Keyphrases
- audio visual
- weakly supervised
- visual data
- multimedia
- multi modal
- multimodal fusion
- visual information
- multi stream
- relation extraction
- topic models
- video data
- superpixels
- video sequences
- object class
- named entities
- object detectors
- semi supervised
- video frames
- natural language processing
- object recognition
- multimedia data
- image data
- natural language
- contextual information
- co occurrence
- key frames
- d objects
- high dimensional