Login / Signup

Multimodal Video Captioning using Object-Auditory Information Fusion with Transformers.

Berkay SelbesMustafa Sert
Published in: NarSUM@MM (2023)
Keyphrases