Extracting Logical Hierarchical Structure of HTML Documents Based on Headings.
Tomohiro ManabeKeishi TajimaPublished in: Proc. VLDB Endow. (2015)
Keyphrases
- hierarchical structure
- html documents
- web pages
- automatic extraction
- web documents
- hierarchically structured
- hierarchical structures
- web page retrieval
- semantic information
- semi structured
- website
- structured documents
- xml documents
- web content
- search engine
- image representation
- tree structure
- semistructured data
- multiscale
- web data
- database
- feature extraction
- data management
- multi dimensional
- semi structured data
- co occurrence
- nearest neighbor
- databases
- data points