Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks.
Mohammad AlsharidYifan CaiHarshita SharmaLior DrukkerAris T. PapageorghiouJ. Alison NoblePublished in: Medical Image Anal. (2022)
Keyphrases
- multi modal
- neural network
- ultrasound images
- video search
- multi modality
- eye tracking data
- eye tracking
- audio visual
- semantic concepts
- high dimensional
- uni modal
- video frames
- video sequences
- video data
- video database
- magnetic resonance imaging
- humanoid robot
- cross modal
- image annotation
- multiple modalities
- feature space