Cross Modal Video Representations for Weakly Supervised Active Speaker Localization.
Rahul SharmaKrishna SomandepalliShrikanth NarayananPublished in: IEEE Trans. Multim. (2023)
Keyphrases
- weakly supervised
- cross modal
- object localization
- visual data
- multi modal
- object class
- topic models
- video data
- multimedia
- video sequences
- semantic concepts
- semi supervised
- superpixels
- object detection
- object detectors
- video frames
- higher level
- named entities
- visual information
- image sequences
- image retrieval
- object recognition
- multiscale