What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models.
Tessa VerhoefKiana ShahrasbiTom KouwenhovenPublished in: CoRR (2024)
Keyphrases
- cross modal
- language model
- multi modal
- language modeling
- speech recognition
- document retrieval
- speech sounds
- probabilistic model
- n gram
- retrieval model
- query expansion
- multimedia retrieval
- automatic speech recognition
- computer vision
- visual recognition
- test collection
- image retrieval
- information retrieval
- visual data
- visual similarity
- text retrieval
- multimedia databases
- bag of words
- graph cuts
- hidden markov models
- feature space
- object recognition