Sign in

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation.

Swapnil BhosaleHaosen YangDiptesh KanojiaXiatian Zhu
Published in: CoRR (2023)
Keyphrases
  • audio visual
  • multi modal
  • image segmentation
  • multiscale
  • machine learning
  • high level
  • image retrieval
  • spatio temporal
  • image database
  • dimensionality reduction
  • visual data
  • temporal context