STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.
Kazuki ShimadaArchontis PolitisParthasaarathy SudarsanamDaniel Aleksander KrauseKengo UchidaSharath AdavanneAapo HakalaYuichiro KoyamaNaoya TakahashiShusuke TakahashiTuomas VirtanenYuki MitsufujiPublished in: NeurIPS (2023)
Keyphrases
- audio visual
- real scenes
- spatial and temporal
- multi modal
- space time
- sound source
- spatio temporal
- augmented reality
- visual information
- visual data
- stereo images
- depth map
- multimedia
- spatial information
- multi stream
- human actions
- audio features
- event detection
- metadata
- spatial relationships
- human activities
- moving objects
- image processing
- image data
- high dimensional
- stereo pair
- video sequences