UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling.

Published in: ECCV (36) (2022)

Keyphrases