Glean: Structured Extractions from Templatic Documents.
Sandeep TataNavneet PottiJames B. WendtLauro Beltrão CostaMarc NajorkBeliz GunelPublished in: Proc. VLDB Endow. (2021)
Keyphrases
- unstructured data
- document collections
- information retrieval
- information retrieval systems
- textual data
- xml documents
- text documents
- metadata
- document classification
- relevant documents
- web documents
- structured data
- real world
- retrieval systems
- document retrieval
- document clustering
- free text
- user queries
- document representation
- legal documents
- relational databases
- digital documents
- text retrieval
- ranked list
- electronic documents
- latent semantic analysis
- database
- vector space model
- time stamped
- document analysis
- structured documents
- search engine
- keywords
- digital libraries
- information extraction