Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian FengDimitrios DimitriadisShrikanth NarayananPublished in: CoRR (2024)
Keyphrases
- audio visual
- audio stream
- automatic transcription
- audio signals
- object recognition
- speaker identification
- modeling framework
- multimedia
- emotion recognition
- audio recordings
- cepstral features
- broadcast news
- modeling method
- text to speech
- audio video
- digital audio
- pattern recognition
- visual speech
- recognition engine
- spoken documents
- audio features
- feature extraction
- recognition rate
- multi modal
- visual features
- linear predictive coding
- prosodic features
- speech processing
- visual information
- recognition accuracy
- gaussian mixture model
- human computer interaction
- mel frequency cepstral coefficients
- discriminative learning
- generative model
- acoustic features
- acoustic signals
- signal processing
- hidden markov models
- speaker verification
- recognition algorithm