STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.
Kazuki ShimadaArchontis PolitisParthasaarathy SudarsanamDaniel KrauseKengo UchidaSharath AdavanneAapo HakalaYuichiro KoyamaNaoya TakahashiShusuke TakahashiTuomas VirtanenYuki MitsufujiPublished in: CoRR (2023)
Keyphrases
- audio visual
- real scenes
- spatial and temporal
- multi modal
- space time
- sound source
- spatio temporal
- visual information
- augmented reality
- depth map
- event detection
- visual data
- multimedia
- spatial information
- temporal information
- human actions
- multi stream
- keywords
- audio features
- action recognition
- multi view
- image data
- moving objects
- stereo images
- stereo pair
- metadata
- high level
- three dimensional
- feature selection