AVSegFormer: Audio-Visual Segmentation with Transformer.
Shengyi GaoZhe ChenGuo ChenWenhai WangTong LuPublished in: CoRR (2023)
Keyphrases
- audio visual
- multi modal
- temporal segmentation
- visual information
- temporal context
- audio visual speech recognition
- person authentication
- image segmentation
- multi stream
- multimedia
- visual data
- emotion recognition
- video summarization
- multimodal fusion
- low level
- domain knowledge
- image representation
- image retrieval
- video sequences
- high level
- e learning