Inscriptis - A Python-based HTML to text conversion library optimized for knowledge extraction from the Web.
Albert WeichselbraunPublished in: CoRR (2021)
Keyphrases
- knowledge extraction
- textual documents
- web pages
- textual data
- web documents
- html pages
- website
- text information
- knowledge discovery
- plain text
- text mining
- information retrieval and extraction
- web development
- information extraction
- data mining
- web applications
- systems engineering
- medical databases
- html documents
- web images
- textual information
- user generated content
- text documents
- database
- web browser
- semi structured
- web mining
- open source
- keywords
- text data
- textual features
- bibliographic information
- dynamic content
- text content
- linked data
- end users
- database driven
- digital documents
- semantic web
- programming language
- information retrieval
- data sets