Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2.
Chun XuEn-Wei SunPublished in: CoRR (2024)
Keyphrases
- fine tuning
- audio visual
- audio stream
- audio signals
- broadcast news
- emotion recognition
- cepstral features
- text to speech
- speaker identification
- multimedia
- viable alternative
- fine tune
- audio features
- acoustic signals
- digital audio
- speech processing
- fine tuned
- linear predictive coding
- multi modal
- prosodic features
- audio video
- audio recordings
- generation process
- multi stream
- speech recognition
- automatic transcription
- acoustic features
- video clips
- human language
- content based video retrieval
- spoken documents
- speech music discrimination
- recognition engine
- speech synthesis
- speaker recognition
- automatic speech recognition
- visual information
- spontaneous speech
- visual speech
- audio files
- speech signal
- visual data
- low level features
- signal processing
- voice activity detection
- hidden markov models
- feature vectors