Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes.
Yaoting WangPeiwen SunDongzhan ZhouGuangyao LiHonggang ZhangDi HuPublished in: CoRR (2024)
Keyphrases
- visual scene
- visual information
- complex scenes
- vision system
- visual attention
- object recognition
- visual data
- visual context
- object features
- natural images
- video sequences
- audio visual
- visual features
- d objects
- single image
- human computer interaction
- spatial relations
- image collections
- natural scenes
- multiscale
- computer vision