Improving Cross-Modal Visual Answer Localization in Chinese Medical Instructional Video Using Language Prompts.
Zineng ZhouJun LiuShuang ChengHaiyong LuoYang GuJian YePublished in: NLPCC (3) (2023)
Keyphrases
- cross modal
- visual data
- multi modal
- multimedia
- multimedia retrieval
- multimedia databases
- semantic concepts
- multimedia data
- multiple modalities
- visual recognition
- video data
- image retrieval
- video sequences
- perceptual information
- visual information
- video frames
- video content
- video streams
- visual similarity
- human activities
- video analysis
- information retrieval
- contextual information
- high dimensional
- human motion
- key frames
- visual concepts
- visual features
- image data
- object recognition
- metadata