Information Redundancy and Biases in Public Document Information Extraction Benchmarks.
Seif LaatiriPirashanth RatnamoganJoël TangLaurent LamWilliam VanhuffelFabien CaspaniPublished in: ICDAR (3) (2023)
Keyphrases
- information redundancy
- information extraction
- web documents
- text documents
- information retrieval
- text summarization
- unstructured documents
- text mining
- image quality
- information retrieval systems
- natural language processing
- precision and recall
- cross document
- document classification
- free text
- retrieval systems
- question answering
- named entities
- open domain
- word sense disambiguation
- machine learning
- document retrieval
- document images
- keywords
- document clustering
- database
- conditional random fields
- named entity recognition
- semantic information
- mutual information
- structured data
- document collections
- relation extraction
- ontology based information extraction
- sequence patterns
- document content
- document analysis
- relational learning
- textual data
- document representation
- vector space model
- relevant documents
- machine translation