Modèle probabiliste pour l'extraction de structures dans les documents web.
Guillaume WisniewskiFrancis MaesLudovic DenoyerPublished in: Document Numérique (2007)
Keyphrases
- web documents
- web data
- information extraction
- website
- multilingual documents
- web information
- data extraction
- digital documents
- web pages
- textual data
- web applications
- content similarity
- extraction rules
- information retrieval
- document classification
- web crawler
- web information extraction
- web queries
- text information
- document retrieval
- structured information
- natural language processing
- topic specific
- newspaper articles
- text content
- document representation
- web mining
- semantic web
- document repositories
- database
- textual features
- answering questions
- google scholar
- electronic documents
- open directory project
- vector space model
- web content
- text documents
- relevant documents
- retrieval systems
- document collections
- information retrieval systems
- focused crawling
- desired information
- web environment
- user generated content
- query terms
- semi structured
- data interchange
- web search
- metadata