Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding.
Kenton LeeMandar JoshiIulia Raluca TurcHexiang HuFangyu LiuJulian Martin EisenschlosUrvashi KhandelwalPeter ShawMing-Wei ChangKristina ToutanovaPublished in: ICML (2023)
Keyphrases