Multimodal grid features and cell pointers for Scene Text Visual Question Answering.

Published in: CoRR (2020)

Keyphrases