Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification.
Anirudh S. SundarChao-Han Huck YangDavid M. ChanShalini GhoshVenkatesh RavichandranPhani Sankar NidadavoluPublished in: CoRR (2023)
Keyphrases
- speech recognition
- pattern recognition
- speech processing
- speaker identification
- hidden markov models
- speech recognition technology
- speech recognizer
- audio visual speech recognition
- language model
- speech signal
- speech synthesis
- noisy environments
- audio visual
- signal processing
- machine learning
- feature space
- feature selection
- multimedia
- probabilistic neural network
- classification accuracy
- feature extraction
- multi stream
- mel frequency cepstral coefficients
- speech recognition systems
- support vector machine
- broadcast news
- visual speech
- automatic speech recognition
- text classification
- image classification
- speaker dependent
- speech retrieval
- image processing