Login / Signup

AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning.

Jongsuk KimJiwon ShinJunmo Kim
Published in: CoRR (2024)
Keyphrases