Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory.
Joanna HongMinsu KimSe Jin ParkYong Man RoPublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2021)
Keyphrases
- text to speech
- fundamental frequency
- emotion recognition
- speech synthesis
- visual features
- speech sounds
- speech recognition
- memory usage
- voice activity detection
- text to speech synthesis
- speech recognition errors
- high level
- automatic speech recognition systems
- memory requirements
- visual information
- reconstruction method
- acoustic features
- content based video retrieval
- speech quality
- low level
- three dimensional
- compressive sensing
- dialogue system
- spoken language
- speech signal
- audio visual
- multi modal