MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker Extraction.
Shilong YuChenhui YangPublished in: MMM (2) (2024)
Keyphrases
- audio visual
- end to end
- congestion control
- multiscale
- internet protocol
- wireless ad hoc networks
- multi modal
- visual information
- speaker verification
- image representation
- visual data
- emotion recognition
- transport layer
- packet loss rate
- multimedia
- ad hoc networks
- admission control
- multi stream
- differentiated services
- transport protocol
- audio visual speech recognition
- application layer