Multimodal city-verification on flickr videos using acoustic and textual features.
Howard LeiJaeyoung ChoiGerald FriedlandPublished in: ICASSP (2012)
Keyphrases
- textual features
- bag of words
- multimodal biometrics
- video sequences
- geo referenced
- video frames
- social media
- multi modal
- photo collections
- audio features
- visual features
- video data
- multimedia
- image retrieval
- event recognition
- video content
- web pages
- user generated
- image classification
- user generated content
- image collections
- key frames
- probabilistic model