Joint Speech Recognition and Audio Captioning.
Chaitanya NarisettyEmiru TsunooXuankai ChangYosuke KashiwagiMichael HentschelShinji WatanabePublished in: ICASSP (2022)
Keyphrases
- speech recognition
- speaker identification
- speech processing
- speech recognition technology
- automatic speech recognition
- speech synthesis
- hidden markov models
- cepstral coefficients
- audio visual speech recognition
- language model
- speech signal
- speech recognizer
- broadcast news
- speech understanding
- noisy environments
- multimedia
- keyword spotting
- speaker dependent
- speaker recognition
- isolated word
- pattern recognition
- speech recognition systems
- speaker independent
- speech recognition errors
- neural network
- voice activity detection
- audio signals
- visual data
- visual information
- multi modal
- signal processing
- audio signal
- speech retrieval
- speech recognizers
- computer vision