A Benchmark of PDF Information Extraction Tools Using a Multi-task and Multi-domain Evaluation Framework for Academic Documents.
Norman MeuschkeApurva JagdaleTimo SpindeJelena MitrovicBela GippPublished in: iConference (2) (2023)
Keyphrases
- evaluation framework
- multi task
- multi domain
- information extraction
- web documents
- text documents
- learning tasks
- information retrieval
- cross domain
- evaluation process
- evaluation methodology
- multi class
- transfer learning
- precision and recall
- evaluation measures
- information retrieval systems
- semantic annotation
- domain specific
- text mining
- natural language processing
- evaluation metrics
- conditional random fields
- document collections
- machine learning
- feature selection
- document clustering
- user queries
- learning problems
- document retrieval
- prior knowledge
- decision trees
- relevant documents
- data sets
- learning to rank
- co occurrence
- general purpose
- probabilistic model
- text summarization
- learning algorithm
- real world
- neural network