KAP: Pre-training Transformers for Corporate Documents Understanding.
Ibrahim Souleiman MahamoudMickaël CoustatyAurélie JosephVincent Poulain D'AndecyJean-Marc OgierPublished in: ICDAR Workshops (2) (2023)
Keyphrases
- information retrieval
- document collections
- document retrieval
- training phase
- xml documents
- retrieval systems
- information retrieval systems
- document representation
- document classification
- test set
- training examples
- digital libraries
- case study
- metadata
- training corpus
- text mining
- online learning
- active learning
- training set
- text documents
- vector space
- learning algorithm
- digital documents
- text classifiers
- document content
- database
- vector space model
- web documents
- supervised learning
- search engine
- data sets