LaTr: Layout-Aware Transformer for Scene-Text VQA.

Ali Furkan Biten Ron Litman Yusheng Xie Srikar Appalaraju R. Manmatha

Published in: CoRR (2021)

Keyphrases

scene text
natural scene images
text detection
image database
video database
scene images
complex background
video frames
object recognition
image data
image classification
face detection
text regions