Login / Signup

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.

Minsu KimJeongsoo ChoiSoumi MaitiJeong Hun YeoShinji WatanabeYong Man Ro
Published in: ICASSP (2024)
Keyphrases