The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction.
Marcin MironczukPublished in: Knowl. Inf. Syst. (2018)
Keyphrases
- information extraction
- wrapper induction
- semi supervised
- semi structured
- named entity recognition
- active learning
- wrapper generation
- natural language processing
- multi view
- free text
- relation extraction
- structured data
- web information extraction
- semi supervised learning
- web documents
- information retrieval
- labeled data
- named entities
- information extraction systems
- website
- question answering
- machine learning
- unlabeled data
- text documents
- text processing
- natural language
- web mining
- text mining
- supervised learning
- pairwise
- web pages
- information sources
- unsupervised learning
- html documents
- extraction patterns
- data sets