A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents.
Norman MeuschkeApurva JagdaleTimo SpindeJelena MitrovicBela GippPublished in: CoRR (2023)
Keyphrases
- evaluation framework
- multi task
- multi domain
- information extraction
- text documents
- web documents
- information retrieval
- cross domain
- learning tasks
- transfer learning
- evaluation methodology
- evaluation process
- domain specific
- natural language processing
- multi class
- precision and recall
- learning problems
- evaluation measures
- information retrieval systems
- document collections
- machine learning
- semantic annotation
- document clustering
- document retrieval
- text mining
- evaluation metrics
- real world
- user queries
- data mining
- relevant documents
- recommender systems
- multimedia
- multi label