Pre-Finetuning for Few-Shot Emotional Speech Recognition.
Maximillian ChenZhou YuPublished in: INTERSPEECH (2023)
Keyphrases
- speech recognition
- hidden markov models
- pattern recognition
- language model
- automatic speech recognition
- speech recognizer
- speech understanding
- speech processing
- speech synthesis
- speech signal
- speech recognition technology
- video shots
- speech recognition systems
- speaker identification
- visual features
- noisy environments
- video sequences
- speaker independent
- image processing
- video content
- speech retrieval
- emotional state
- key frames
- probabilistic model
- low level
- acoustic models
- speech recognizers