Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal.
Yohei AbeAkinori ItoPublished in: IIH-MSP (2013)
Keyphrases
- multi modal
- voice activity detection
- speech signal
- noisy environments
- image features
- speech recognition
- automatic speech recognition
- speaker verification
- speaker identification
- computer vision
- image classification
- hidden markov models
- audio visual
- multi modality
- spectral analysis
- non stationary
- image content
- noise reduction
- feature vectors
- object recognition
- pattern recognition
- image representation
- video search
- language model
- high dimensional
- cross modal
- multiple modalities
- fundamental frequency
- uni modal
- automatic speech recognition systems
- neural network
- low level