Login / Signup
Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning.
Saurabhchand Bhati
Jesús Villalba
Laureano Moro-Velázquez
Thomas Thebaud
Najim Dehak
Published in:
INTERSPEECH (2023)
Keyphrases
</>
visual learning
image features
text graphics
image segmentation
input image
multiscale
image retrieval
image classification
image representation
hidden markov models
test images
high resolution
feature points
similarity measure
probabilistic model
object recognition
pattern recognition
spatial information