Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective.
Yingying FanYu WuBo DuYutian LinPublished in: NeurIPS (2023)
Keyphrases
- audio visual
- weakly supervised
- visual data
- multimedia
- multi modal
- natural language
- topic models
- visual information
- object class
- video data
- relation extraction
- superpixels
- semi supervised
- video frames
- high dimensional data
- multimedia data
- video sequences
- object detectors
- named entities
- high level
- natural language processing
- object detection
- low level