Categorizing Web Pages as a Preprocessing Step for Information Extraction.
Viktor PekarRichard EvansRuslan MitkovPublished in: LREC (2004)
Keyphrases
- preprocessing step
- information extraction
- web pages
- web information extraction
- web documents
- data extraction
- preprocessing
- search engine
- website
- feature selection
- web page classification
- natural language processing
- named entity recognition
- data preprocessing
- precision and recall
- web mining
- text categorization
- information retrieval
- free text
- structured data
- feature extraction
- text mining
- dimensionality reduction
- web search engines
- web search
- semi structured
- web content
- machine learning
- keywords
- textual data
- named entities
- question answering
- link analysis
- web users
- web data
- link structure
- web server
- principal component analysis
- image processing
- linear feature extraction
- image segmentation and object recognition