DEVICE: DEpth and VIsual ConcEpts Aware Transformer for TextCaps.

Dongsheng Xu Qingbao Huang Yi Cai

Published in: CoRR (2023)

Keyphrases

visual concepts
image content
video content
learning tasks
image annotation
image collections
object categories
positive examples
visual content
semantic concepts
semantic gap
object detection
visual features
multiscale
multi modal
multi label
positive and negative
input image
computer vision
visual data