Sign in

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.

Hao FengQi LiuHao LiuWengang ZhouHouqiang LiCan Huang
Published in: CoRR (2023)
Keyphrases
  • frequency domain
  • spatial domain
  • probabilistic model
  • image segmentation
  • computer vision
  • multiscale
  • subband
  • autoregressive