T-VSL: Text-Guided Visual Sound Source Localization in Mixtures.
Tanvir MahmudYapeng TianDiana MarculescuPublished in: CoRR (2024)
Keyphrases
- source localization
- sound source
- wireless sensor networks
- visual information
- information retrieval
- computational auditory scene analysis
- web images
- visual features
- text mining
- mixture model
- low level
- text data
- audio visual
- keywords
- real time
- vision system
- multi modal
- non stationary
- visual data
- reinforcement learning
- image sequences
- audio content