Xtractor: A light wrapper for XML paragraph-centric documents.
Youakim BadrPublished in: SITIS (2005)
Keyphrases
- xml documents
- xml format
- semi structured documents
- document centric
- document structure
- metadata
- xml data
- extensible markup language
- xml schema
- structured documents
- document repository
- information retrieval
- document level
- xml queries
- content and structure
- electronic documents
- standard for data exchange
- web documents
- information extraction
- feature selection
- semi structured
- keywords
- xpath queries
- document type
- xml databases
- vector space model
- document retrieval
- semi structured data
- data model
- document collections
- markup language
- text documents
- sentence level
- relational databases
- document clustering
- logical structure
- information retrieval systems
- xml files
- database
- data integration
- linguistic features
- xml elements
- labeling scheme
- user queries
- relevant documents
- news stories
- black box