Login / Signup
Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion.
Yingxuan Li
Ryota Hinami
Kiyoharu Aizawa
Yusuke Matsui
Published in:
CoRR (2024)
Keyphrases
</>
multimodal fusion
audio visual
multi modal
high robustness
high level
visual features