Classifiers without borders: incorporating fielded text from neighboring web pages.
Xiaoguang QiBrian D. DavisonPublished in: SIGIR (2008)
Keyphrases
- web pages
- keywords
- web documents
- textual content
- search engine
- plain text
- text retrieval
- lexical features
- website
- training data
- text content
- improve recognition accuracy
- web page classification
- decision trees
- content features
- information retrieval
- web search
- anchor text
- free text
- web search engines
- machine learning algorithms
- text classifiers
- support vector
- text mining
- training set
- linear classifiers
- text data
- svm classifier
- classification algorithm
- feature selection
- naive bayes
- web content
- machine learning
- html pages
- feature set
- data extraction
- textual data
- web images
- test set
- roc curve
- data records
- image classification
- information retrieval systems
- class labels
- multi class
- dynamic content
- support vector machine
- database
- web users
- web information extraction
- learning algorithm
- web server