Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective.
Yingying FanYu WuYutian LinBo DuPublished in: CoRR (2023)
Keyphrases
- audio visual
- weakly supervised
- visual data
- multimedia
- natural language
- multi modal
- video data
- visual information
- relation extraction
- object class
- topic models
- semi supervised
- superpixels
- object detectors
- video frames
- video sequences
- key frames
- contextual information
- natural language processing
- low level
- object recognition
- named entities
- machine learning
- multimedia data
- visual features
- co occurrence
- information extraction
- high dimensional
- feature selection