WebCollectives: A light regular expression based web content extractor in Java.
Hayri Volkan AgunPublished in: SoftwareX (2023)
Keyphrases
- web content
- regular expressions
- pattern matching
- website
- query language
- finite automata
- web data
- xml schema
- web pages
- semistructured data
- open source
- web documents
- user generated
- deterministic finite automata
- string matching
- matching algorithm
- source code
- object oriented
- regular path queries
- semantic browsing
- static analysis
- web resources
- high level
- query evaluation
- data model
- web browsing
- social networks
- semistructured databases
- database