An Unsupervised Machine Learning Approach to Body Text and Table of Contents Extraction from Digital Scientific Articles.
Stefan KlampflRoman KernPublished in: TPDL (2013)
Keyphrases
- table of contents
- scientific articles
- machine learning
- topic modeling
- text mining
- scientific literature
- information extraction
- text documents
- text processing
- text classification
- information retrieval
- topic models
- database
- knowledge discovery
- oracle database
- database administrators
- semi supervised
- data mining
- data model
- data analysis
- sql server
- feature selection
- latent dirichlet allocation
- digital libraries
- natural language
- biomedical literature