Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model.

Fan ZhangNaye JiFuxing GaoSiyuan ZhaoZhaohan WangShunman Li
Published in: CoRR (2023)