Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech.
George SterpuNaomi HartePublished in: Comput. Speech Lang. (2022)
Keyphrases
- speech recognition
- audio visual speech recognition
- speaker identification
- neural network
- visual speech
- multi stream
- cepstral coefficients
- audio visual
- automatic speech recognition
- hidden markov models
- pattern recognition
- speech recognition technology
- broadcast news
- noisy environments
- speech signal
- audio signals
- language model
- audio signal
- multi modal
- visual speech recognition
- computer vision
- image processing
- speaker recognition
- multimedia
- speech synthesis
- information retrieval
- bayesian networks
- machine learning