Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts.
Tomasz StanislawekFilip GralinskiAnna WróblewskaDawid LipinskiAgnieszka KaliskaPaulina RosalskaBartosz TopolskiPrzemyslaw BiecekPublished in: ICDAR (1) (2021)
Keyphrases
- information extraction
- free text
- web documents
- text documents
- information retrieval
- natural language text
- question answering
- document collections
- text mining
- unstructured documents
- data collections
- unstructured text
- database
- textual data
- machine learning
- natural language processing
- keywords
- legal documents
- benchmark datasets
- text analysis
- resource intensive
- text processing
- text data
- document classification
- precision and recall
- document retrieval
- complex systems
- structured data
- web search
- xml documents
- metadata
- web mining
- data extraction
- natural language
- high level
- feature selection