Information Redundancy and Biases in Public Document Information Extraction Benchmarks.
Seif LaatiriPirashanth RatnamoganJoël TangLaurent LamWilliam VanhuffelFabien CaspaniPublished in: CoRR (2023)
Keyphrases
- information redundancy
- information extraction
- web documents
- text documents
- information retrieval
- unstructured documents
- text summarization
- natural language processing
- precision and recall
- information retrieval systems
- document collections
- named entities
- image quality
- text mining
- document images
- retrieval systems
- document classification
- question answering
- named entity recognition
- mutual information
- structured data
- tf idf
- relation extraction
- free text
- document clustering
- data mining
- relational learning
- document analysis
- semi structured
- keywords
- machine learning
- web mining
- high quality
- text processing
- digital documents
- document retrieval
- relevant documents
- vector space model
- user queries
- document representation
- natural language
- clustering algorithm
- open domain
- database