KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents.
Oshri NaparstekRoi PonyInbar ShapiraFoad Abo DahoodOphir AzulaiYevgeny YarokerNadav RubinsteinMaksym LysakPeter W. J. StaarAhmed NassarNikolaos LivathinosChristoph AuerElad AmraniIdan FriedmanOrit PrinceYevgeny BurshteinAdi Raz GoldfarbUdi BarzelayPublished in: CoRR (2024)
Keyphrases
- document collections
- information retrieval
- text documents
- relevant documents
- information extraction
- database
- decision making
- business processes
- data mining
- electronic documents
- document classification
- business models
- document retrieval
- business process
- business intelligence
- xml documents
- pairwise
- similarity scores
- information systems
- information retrieval systems
- web documents
- benchmark datasets
- document clustering
- document representation
- textual data
- document analysis
- automatic extraction
- electronic commerce
- vector space model
- news articles
- feature set
- text mining
- metadata
- real world