Information Extraction from Webpages Based on DOM Distances.
Carlos J. CastilloHéctor ValeroJosé Guadalupe RamosJosep SilvaPublished in: CICLing (2) (2012)
Keyphrases
- information extraction
- web pages
- website
- web documents
- search engine
- natural language processing
- data extraction
- information retrieval
- distance measure
- web search engines
- structured data
- free text
- question answering
- keywords
- precision and recall
- web mining
- named entity recognition
- relation extraction
- text processing
- machine learning
- euclidean distance
- conditional random fields
- named entities
- semi structured
- textual data
- open domain
- neural network
- distance function
- xml documents
- ontology based information extraction
- text documents
- text mining
- information overload
- relational databases
- database
- web content
- co occurrence
- dissimilarity measure
- object oriented
- relational learning
- markup language
- data structure
- html documents