New Techniques for the Discovery of Logical Documents in Web.
Keishi TajimaKatsumi TanakaPublished in: DANTE (1999)
Keyphrases
- web documents
- web data
- web information
- multilingual documents
- information retrieval
- web pages
- logical structure
- web mining
- document repositories
- website
- digital documents
- information retrieval systems
- text information
- web applications
- information sources
- structured information
- document classification
- newspaper articles
- database
- textual data
- open directory project
- metadata
- text documents
- xml documents
- web search engines
- semantic web
- focused crawling
- digital libraries
- content similarity
- knowledge discovery
- web content
- retrieval systems
- search interface
- multimedia documents
- document clustering
- electronic documents
- information extraction
- web crawler
- relevant documents