Login / Signup

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions.

Woojeong JinSubhabrata MukherjeeYu ChengYelong ShenWeizhu ChenAhmed Hassan AwadallahDamien JoseXiang Ren
Published in: CoRR (2023)
Keyphrases