CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding.
Xingyu Bruce LiuRuolin WangDingzeyu LiXiang 'Anthony' ChenAmy PavelPublished in: CoRR (2022)
Keyphrases
- cross modal
- multi modal
- visual data
- video sequences
- video data
- multimedia databases
- multimedia retrieval
- semantic concepts
- image retrieval
- video content
- multimedia
- high level
- visual recognition
- video frames
- video streams
- computer vision
- key frames
- multimedia data
- video analysis
- visual information
- space time
- automatic image annotation
- low level
- information retrieval
- perceptual information