CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation.
Kexin LiZongxin YangLei ChenYi YangJun XiaoPublished in: ACM Multimedia (2023)
Keyphrases
- audio visual
- video segmentation
- multi modal
- video sequences
- visual information
- audio features
- visual data
- segmentation method
- video frames
- multimedia
- emotion recognition
- multi stream
- video summarization
- video analysis
- multimodal fusion
- audio visual speech recognition
- active contours
- graph cuts
- sports video
- spatio temporal
- pairwise
- image processing
- machine learning