CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation.
Kexin LiZongxin YangLei ChenYi YangJun XiaoPublished in: CoRR (2023)
Keyphrases
- audio visual
- video segmentation
- multi modal
- video sequences
- audio features
- visual information
- visual data
- video frames
- emotion recognition
- video analysis
- multimedia
- audio visual speech recognition
- segmentation method
- video summarization
- multi stream
- multimodal fusion
- audio visual content
- sports video
- multimedia databases
- visual features
- image features