Multimodal Imbalance-Aware Gradient Modulation for Weakly-Supervised Audio-Visual Video Parsing.
Jie FuJunyu GaoBing-Kun BaoChangsheng XuPublished in: IEEE Trans. Circuits Syst. Video Technol. (2024)
Keyphrases
- audio visual
- weakly supervised
- visual data
- multimedia
- multi modal
- multimodal fusion
- multi stream
- visual information
- topic models
- object class
- superpixels
- video sequences
- video data
- semi supervised
- relation extraction
- video frames
- natural language
- key frames
- multimedia data
- object detectors
- visual features
- data sets
- named entities
- natural language processing
- multiscale
- image processing