ViLaS: Integrating Vision and Language into Automatic Speech Recognition.
Minglun HanFeilong ChenZiyi NiLinghui MengJing ShiShuang XuBo XuPublished in: CoRR (2023)
Keyphrases
- automatic speech recognition
- speech recognition
- speech signal
- word error rate
- conversational speech
- speech retrieval
- broadcast news
- hidden markov models
- recognition errors
- computer vision
- noisy environments
- spoken words
- word recognition
- spontaneous speech
- vision system
- image processing
- natural language
- speech corpus
- neural network
- language processing
- acoustic features
- multi modal
- speech sounds
- machine learning