Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and Cora.
Mark GrennanJoeran BeelPublished in: CoRR (2020)
Keyphrases
- data sets
- spatial data
- data analysis
- statistical analysis
- raw data
- missing data
- database
- data processing
- human subjects
- data sources
- prior knowledge
- machine learning
- data points
- image data
- text mining
- end users
- data collection
- computer systems
- training samples
- training examples
- pattern matching
- sensor data
- natural language
- similarity measure
- labelled data