Video-realistic expressive audio-visual speech synthesis for the Greek language.
Panagiotis Paraskevas FilntisisAthanasios KatsamanisPirros TsiakoulisPetros MaragosPublished in: Speech Commun. (2017)
Keyphrases
- audio visual
- speech synthesis
- text to speech
- video summarization
- visual data
- multimedia
- meeting room
- multi modal
- audio features
- speech recognition
- audio visual content
- visual information
- temporal context
- multi stream
- video data
- audio visual speech recognition
- multimodal fusion
- video sequences
- vocal tract
- emotion recognition
- speaker verification
- key frames
- video content
- natural language
- high dimensional
- high dimensional data
- multimedia data
- video frames
- space time
- spatio temporal
- image data
- visual content
- human actions
- human motion
- temporal information
- image sequences
- data sets