Listen as you wish: Fusion of audio and text for cross-modal event detection in smart cities.
Haoyu TangYupeng HuYunxiao WangShuaike ZhangMingzhu XuJihua ZhuQinghai ZhengPublished in: Inf. Fusion (2024)
Keyphrases
- cross modal
- event detection
- multi modal
- smart cities
- multimedia retrieval
- visual data
- video analysis
- multimedia databases
- activity recognition
- smart city
- image retrieval
- text retrieval
- visual similarity
- database
- keywords
- information retrieval
- text mining
- semantic information
- data fusion
- text documents
- web images
- video search
- text data
- multimedia
- high dimensional
- wordnet
- information extraction
- visual features