SCI-3000: A Dataset for Figure, Table and Caption Extraction from Scientific PDFs.
Filip DarmanovicAllan HanburyMarkus ZlabingerPublished in: ICDAR (1) (2023)
Keyphrases
- database
- text extraction
- probability density function
- visual features
- mixture model
- video retrieval
- science education
- scientific data
- science learning
- artificial intelligence
- probability distribution functions
- scientific knowledge
- benchmark datasets
- scientific literature
- automatic extraction
- bayesian networks
- caption text