Login / Signup
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language.
Mark Hamilton
Andrew Zisserman
John R. Hershey
William T. Freeman
Published in:
CoRR (2024)
Keyphrases
</>
language learning
natural language
visual information
low level
frequency domain
fourier transform
email
visual features
visual data
language processing
visual representation
specification language
fractional fourier transform
computer vision
computer mediated communication
social intelligence design