Catégorisation automatique de pages web chinoises - documents spécialisés vs grand public sur le tabagisme.
Guiyao KePierre ZweigenbaumPublished in: CORIA (2009)
Keyphrases
- web documents
- web information
- web pages
- website
- page layout
- web data
- web users
- focused crawling
- web crawler
- content similarity
- html pages
- focused crawler
- keywords
- topic specific
- web content
- information retrieval
- web mining
- page contents
- textual content
- search engine
- open directory project
- multilingual documents
- page content
- hyperlink structure
- web objects
- information retrieval systems
- link structure
- current search engines
- information extraction
- web crawling
- document repositories
- document collections
- link graph
- web crawlers
- returned by search engines
- xml documents
- ranked list
- dynamic content
- text content
- data extraction
- web queries
- text classification
- web search
- user sessions
- web search engines
- relevant documents
- text documents
- trec web track
- metadata
- search interface
- internet archive
- semi structured