SpeechSyncNet: Speech to Talking Landmark via the fusion of prior frame landmark and the audio.
Xuan-Nam CaoQuoc-Huy TrinhVan-Son HoMinh-Triet TranPublished in: VCIP (2023)
Keyphrases
- audio visual
- landmark extraction
- landmark detection
- multimedia
- speech recognition
- landmark recognition
- speaker identification
- audio stream
- visual landmarks
- cepstral features
- speech processing
- signal processing
- image registration
- visual information
- emotion recognition
- digital audio
- multimodal fusion
- prosodic features
- broadcast news
- landmark points
- text to speech
- spoken language
- audio features
- multi sensor
- information fusion
- data fusion
- mobile robot