Login / Signup

CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models.

Hao-Wen DongXiaoyu LiuJordi PonsGautam BhattacharyaSantiago PascualJoan SerràTaylor Berg-KirkpatrickJulian J. McAuley
Published in: CoRR (2023)
Keyphrases