Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference.
Raymond KosalaMaurice BruynoogheJan Van den BusscheHendrik BlockeelPublished in: IJCAI (2003)
Keyphrases
- web documents
- information extraction
- tree automata
- finite automata
- regular expressions
- semi structured
- tree languages
- document classification
- grammatical inference
- tree structured patterns
- natural language processing
- web search engines
- text mining
- textual data
- information retrieval
- relation extraction
- conditional random fields
- labeled trees
- named entities
- web mining
- structured data
- web pages
- tree structure
- textual information
- unstructured documents
- html documents
- focused crawling
- web content
- web data
- machine learning
- semistructured documents
- structured documents
- link structure
- text documents
- index structure
- knowledge discovery
- data extraction
- context free grammars
- r tree
- pattern matching
- monadic second order logic
- data model
- natural language