Sign in

VLCDoC: Vision-Language contrastive pre-training model for cross-Modal document classification.

Souhail BakkaliZuheng MingMickaël CoustatyMarçal RusiñolOriol Ramos Terrades
Published in: Pattern Recognit. (2023)
Keyphrases
  • document classification
  • probabilistic model
  • computer vision
  • cross modal
  • neural network
  • high level
  • multi modal
  • databases
  • machine learning
  • feature extraction
  • training data
  • multiscale
  • text mining