Pre-Finetuning for Few-Shot Emotional Speech Recognition.
Maximillian ChenZhou YuPublished in: CoRR (2023)
Keyphrases
- speech recognition
- hidden markov models
- language model
- speech recognizer
- speech synthesis
- speech processing
- automatic speech recognition
- pattern recognition
- speech understanding
- speech recognition systems
- speech recognition technology
- speech signal
- video sequences
- speaker dependent
- speech recognizers
- speech retrieval
- speaker independent
- noisy environments
- video shots
- video data
- speech recognition errors
- visual features
- isolated word
- emotion recognition
- machine learning
- keyword spotting
- video content
- multi modal
- probabilistic model