Login / Signup

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios.

Qilang YeZitong YuRui ShaoXinyu XiePhilip H. S. TorrXiaochun Cao
Published in: CoRR (2024)
Keyphrases