Modèle probabiliste pour l'extraction de structures dans les documents semistructurés - Application aux documents Web.
Guillaume WisniewskiLudovic DenoyerFrancis MaesPatrick GallinariPublished in: CORIA (2006)
Keyphrases
- web documents
- document collections
- multilingual documents
- information retrieval systems
- information retrieval
- digital documents
- web information
- web data
- relevant documents
- xml documents
- keywords
- text documents
- information extraction
- structured information
- multimedia documents
- document classification
- document retrieval
- web applications
- web pages
- semantic web
- database
- web crawler
- electronic documents
- text information
- textual content
- metadata
- textual data
- structured documents
- retrieval systems
- website
- ranked list
- relevance feedback
- web content
- document clustering
- semantic information
- content similarity
- page layout
- user queries