Login / Signup

Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion.

Yingxuan LiRyota HinamiKiyoharu AizawaYusuke Matsui
Published in: CoRR (2024)
Keyphrases
  • multimodal fusion
  • audio visual
  • multi modal
  • high robustness
  • high level
  • visual features