Inscriptis - A Python-based HTML to text conversion library optimized for knowledge extraction from the Web.
Albert WeichselbraunPublished in: J. Open Source Softw. (2021)
Keyphrases
- knowledge extraction
- textual documents
- web pages
- web documents
- website
- html pages
- knowledge discovery
- textual data
- web applications
- plain text
- data mining
- text information
- information retrieval and extraction
- web content
- systems engineering
- information extraction
- medical databases
- text content
- semi structured
- semantic web
- html documents
- database
- information retrieval
- keywords
- data extraction
- web images
- open source
- web browser
- programming language
- text mining
- text documents
- free text
- anchor text
- web mining
- end users
- association rules
- web development
- textual features
- bibliographic information
- diagnostic imaging
- digital documents
- document structure
- machine learning
- databases