PMC text mining subset in BioC: about three million full-text articles and growing.
Donald C. ComeauChih-Hsuan WeiRezarta Islamaj DoganZhiyong LuPublished in: Bioinform. (2019)
Keyphrases
- text mining
- journal articles
- scientific literature
- named entities
- digital libraries
- biomedical literature
- text documents
- natural language processing
- news corpus
- information retrieval systems
- textual documents
- information retrieval
- medical subject headings
- web mining
- textual data
- text categorisation
- text corpora
- news articles
- data mining
- information extraction
- text classification
- topic models
- news media
- data analysis
- knowledge discovery
- text clustering
- link analysis
- real world
- medical domain
- probabilistic topic models
- machine learning
- document clustering
- topic modeling
- latent dirichlet allocation
- tens of thousands
- multiple domains
- plain text
- newspaper articles
- high quality
- social networks
- structured data