Audio-visual speech separation based on joint feature representation with cross-modal attention.

Published in: CoRR (2022)

Keyphrases