Speech Guided Masked Image Modeling for Visually Grounded Speech.
Jongbhin WooHyeonggon RyuArda SenocakJoon Son ChungPublished in: ICASSP (2024)
Keyphrases
- speech recognition
- input image
- image data
- single image
- image retrieval
- image features
- image analysis
- image content
- multiscale
- speech signal
- image classification
- edge detection
- image representation
- image segmentation
- image structure
- pattern recognition
- image collections
- automatic speech recognition
- region of interest
- template matching
- text to speech
- high resolution
- image regions
- keypoints
- segmentation method
- image database
- image pixels
- audio visual
- human observers
- spoken language
- image matching
- vector field
- lighting conditions
- low level
- similarity measure
- background noise
- image processing