Extraction of relevant components using shallow structure of HTML documents.

Jun Zeng Brendan Flanagan Toshihiko Sakai Sachio Hirokawa

Published in: FSKD (2012)

Keyphrases

html documents
automatic extraction
web information extraction
repeated patterns
content extraction
web page retrieval
information extraction
web documents
database
web pages
natural language processing
building blocks
structured documents