Identification et structuration hiérarchique des titres dans les documents HTML.
Thierry WaszakClaude de LoupyPatrice BellotPublished in: CORIA (2009)
Keyphrases
- document type
- document structure
- information retrieval
- web documents
- text documents
- document collections
- document retrieval
- keywords
- xml documents
- document classification
- information retrieval systems
- web pages
- document clustering
- semi structured
- relevant documents
- information extraction
- digital documents
- html documents
- textual content
- web browser
- free text
- metadata
- retrieved documents
- legal documents
- electronic documents
- xml files
- structured documents
- vector space model
- ranked list
- web data
- multimedia