Extracting domain-specific terms from unlabeled web documents by bootstrapping and term classifiers.
Tao LiuXiaolong WangBingquan LiuYuanchao LiuMinghui LiPublished in: SMC (2007)
Keyphrases
- web documents
- domain specific
- information extraction
- training data
- document representation
- semi structured
- general purpose
- web pages
- relation extraction
- training examples
- document classification
- class labels
- active learning
- training set
- keywords
- co occurrence
- named entity recognition
- unlabeled data
- link structure
- web directories
- web search engines
- query terms
- web content
- data mining
- tree structured patterns
- vector space model
- classification algorithm
- training samples
- labeled data
- web search
- machine learning