• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions.

Woojeong JinSubhabrata MukherjeeYu ChengYelong ShenWeizhu ChenAhmed Hassan AwadallahDamien JoseXiang Ren
Published in: CoRR (2023)
Keyphrases