Combining Visual and Movement Modalities for No-audio Speech Setection.
Liandong LiZhuo HaoBo SunPublished in: MediaEval (2019)
Keyphrases
- cross modal
- visual data
- audio visual
- visual information
- multi modal
- audio stream
- visual speech
- acoustic signals
- content based video retrieval
- single modality
- broadcast news
- multimodal fusion
- visual features
- audio features
- multimedia
- text to speech
- speaker identification
- audio signals
- cepstral features
- audio recordings
- emotion recognition
- speech processing
- digital audio
- audio video
- speech signal
- visual cues
- signal processing
- linear predictive coding
- hidden markov models
- automatic transcription
- speech music discrimination
- noisy environments
- video search
- low level
- semantic context
- multi stream
- acoustic features
- prosodic features
- acoustic signal
- speaker recognition
- multimedia data
- video data
- human computer interaction