Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction.
Gili LiorYoav GoldbergGabriel StanovskyPublished in: CoRR (2024)
Keyphrases
- structure extraction
- document structure
- document layout
- document collections
- inex book track
- document retrieval
- semi supervised
- document images
- relevant documents
- document representation
- information retrieval
- similarity measure
- text documents
- structured documents
- information retrieval systems
- xml documents
- unsupervised learning
- semantic information
- xml elements
- document clustering
- web documents
- text summarization
- query terms
- database
- query processing
- ad hoc retrieval
- focused retrieval
- feature selection
- retrieval systems
- machine learning