Multi-Modality Speech Recognition Driven by Background Visual Scenes.
Cheng LuoYiguang LiuWenhui SunZhoujian SunPublished in: ICASSP (2024)
Keyphrases
- speech recognition
- multi modality
- visual scene
- multi modal
- medical images
- automatic speech recognition
- information theoretic
- hidden markov models
- image registration
- mutual information
- speech recognizer
- pattern recognition
- visual information
- vision system
- language model
- complex scenes
- speech signal
- speech recognition systems
- visual attention
- imaging modalities
- background noise
- speech synthesis
- noisy environments
- object recognition
- speaker identification
- video sequences
- image collections
- spatial relations
- natural scenes
- higher order
- co occurrence
- low level
- feature vectors
- data mining