Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers.
Piper WoltersChris DawBrian HutchinsonLauren PhillipsPublished in: CoRR (2021)
Keyphrases
- event detection
- environmental sounds
- acoustic features
- visual features
- speech signal
- sound source
- sports video
- temporal segmentation
- speaker verification
- audio visual
- automatic speech recognition
- video surveillance
- video analysis
- audio features
- speech recognition
- video event detection
- activity recognition
- video shots
- video indexing
- event recognition
- video sequences
- scan statistic
- visual information
- key frames
- image retrieval
- music information retrieval
- speaker identification
- machine learning
- semantic concepts
- visual content
- video content
- image classification
- low level
- noisy environments
- focus of attention
- video data
- high level