Multimodal grid features and cell pointers for scene text visual question answering.

Published in: Pattern Recognit. Lett. (2021)

Keyphrases