Joint Speech Recognition and Audio Captioning.
Chaitanya NarisettyEmiru TsunooXuankai ChangYosuke KashiwagiMichael HentschelShinji WatanabePublished in: CoRR (2022)
Keyphrases
- speech recognition
- speaker identification
- speech processing
- speech recognition technology
- audio visual speech recognition
- hidden markov models
- cepstral coefficients
- speech synthesis
- multimedia
- pattern recognition
- speech recognizer
- speech signal
- language model
- audio visual
- automatic speech recognition
- speech recognition systems
- speech understanding
- signal processing
- noisy environments
- audio signals
- keyword spotting
- audio signal
- speaker independent
- multi modal
- isolated word
- visual information
- speaker dependent
- image processing
- mel frequency cepstral coefficients
- speech recognizers
- multimedia information
- speaker recognition
- broadcast news