Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation.
Shaofei HuangHan LiYuqing WangHongji ZhuJiao DaiJizhong HanWenge RongSi LiuPublished in: CoRR (2023)
Keyphrases
- audio visual
- visual data
- multi modal
- temporal segmentation
- visual information
- audio visual speech recognition
- multi stream
- video scene
- emotion recognition
- query processing
- audio features
- multimodal fusion
- multimedia
- multiscale
- audio visual content
- video data
- image regions
- database
- retrieval systems
- spatial context
- high dimensional data
- data sources
- data model
- image sequences
- three dimensional
- data sets