Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning.

Published in: INTERSPEECH (2023)

Keyphrases