Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning.
Wenrui LiXi-Le ZhaoZhengyu MaXingtao WangXiaopeng FanYonghong TianPublished in: ACM Multimedia (2023)
Keyphrases
- audio visual
- visual data
- multi modal
- video summarization
- visual information
- motion estimation
- image sequences
- multi stream
- space time
- audio visual speech recognition
- moving objects
- person authentication
- human motion
- high dimensional
- spatial and temporal
- multimedia
- databases
- image data
- co occurrence
- domain knowledge
- human computer interaction
- three dimensional
- knowledge base
- data sets