Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout.
Filip GralinskiTomasz StanislawekAnna WróblewskaDawid LipinskiAgnieszka KaliskaPaulina RosalskaBartosz TopolskiPrzemyslaw BiecekPublished in: CoRR (2020)
Keyphrases
- information extraction
- free text
- web documents
- information retrieval
- text documents
- xml documents
- information retrieval systems
- text analysis
- textual data
- natural language processing
- natural language text
- web mining
- precision and recall
- relational databases
- resource intensive
- unstructured documents
- document classification
- ranked list
- document collections
- metadata
- machine learning
- semantic information
- document representation
- structured data
- clustering algorithm
- retrieved documents
- data extraction
- document analysis
- text categorization
- text mining
- document image retrieval