Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora.
Jing-Shin ChangPublished in: SIGHAN@IJCNLP 2005 (2005)
Keyphrases
- web documents
- domain specific
- web corpora
- information extraction
- extraction rules
- n gram
- keywords
- semi structured
- web pages
- relation extraction
- web search engines
- query expansion
- document classification
- text mining
- automatic extraction
- query translation
- natural language
- wrapper induction
- information retrieval
- document representation
- link structure
- tree structured patterns
- knowledge discovery
- vector space model
- comparable corpora