Multi-level, multi-modal interactions for visual question answering over text in images.

Published in: World Wide Web (2022)

Keyphrases