Boilerplate detection using shallow text features.
Christian KohlschütterPeter FankhauserWolfgang NejdlPublished in: WSDM (2010)
Keyphrases
- false positives
- image features
- information retrieval
- feature extraction
- feature vectors
- detection algorithm
- automatically extracted
- information extraction
- extracted features
- detection method
- content features
- additional features
- adaboost classifier
- false alarms
- automatic detection
- detection rate
- spatial information
- feature set
- co occurrence
- natural language processing
- object detection
- computer vision
- semantic information
- question answering
- keypoints
- text mining
- detection accuracy
- low level
- feature space
- feature selection