Login / Signup
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.
Hao Feng
Qi Liu
Hao Liu
Wengang Zhou
Houqiang Li
Can Huang
Published in:
CoRR (2023)
Keyphrases
</>
frequency domain
spatial domain
probabilistic model
image segmentation
computer vision
multiscale
subband
autoregressive