Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction.
Gili LiorYoav GoldbergGabriel StanovskyPublished in: ACL (Findings) (2024)
Keyphrases
- structure extraction
- document structure
- document layout
- document collections
- document representation
- inex book track
- structured documents
- document images
- document retrieval
- information retrieval
- xml documents
- information retrieval systems
- similarity measure
- database
- document clustering
- unsupervised learning
- semi supervised
- document analysis
- text summarization
- retrieval systems
- text documents
- data fusion
- xml elements
- keywords
- semantic information
- search engine
- topic models