Enhance Temporal Relations in Audio Captioning with Sound Event Detection.
Zeyu XieXuenan XuMengyue WuKai YuPublished in: CoRR (2023)
Keyphrases
- event detection
- temporal relations
- complex events
- video event
- primitive events
- soccer video
- event recognition
- video event detection
- temporal information
- video analysis
- activity recognition
- spatial relations
- multimedia
- temporal reasoning
- video surveillance
- composite events
- temporal structure
- mid level
- audio visual
- sports video
- visual information
- multiscale
- visual data
- scan statistic
- video clips
- spatial information
- information extraction
- computer vision
- high level