Login / Signup
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions.
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
Published in:
CoRR (2023)
Keyphrases
</>
image regions
language generation
image features
english text
image data
image content
text to speech synthesis
low level
computational linguistics
natural language
vision system
spatial context
computer vision
text documents
text mining
image pixels
salient regions
image similarity
natural language processing