City-Identification of Flickr Videos Using Semantic Acoustic Features.
Benjamin ElizaldeGuan-Lin ChaoMing ZengIan R. LanePublished in: CoRR (2016)
Keyphrases
- acoustic features
- audio features
- speaker verification
- visual features
- music information retrieval
- speech signal
- video sequences
- automatic speech recognition
- image retrieval
- similarity measure
- audio visual
- video content
- video data
- image collections
- cross correlation
- speech recognition
- noisy environments
- multi modal
- high dimensional
- high level