CNNs and Fisher Vectors for No-Audio Multimodal Speech Detection.
Jose Vargas QuirosHayley HungPublished in: MediaEval (2019)
Keyphrases
- audio visual
- fisher vectors
- multi stream
- audio stream
- image classification
- multi modal
- visual information
- object detection
- speech recognition
- broadcast news
- visual classification
- person re identification
- text to speech
- multiscale
- multimodal interfaces
- visual data
- similarity search
- signal processing
- image retrieval
- visual speech
- pairwise
- image sequences