Digitizing and parsing semi-structured historical administrative documents from the G.I. Bill mortgage guarantee program.
Sara LafiaDavid A. BleckleyJ. Trent AlexanderPublished in: J. Documentation (2023)
Keyphrases
- semi structured
- semi structured documents
- web documents
- free text
- data collections
- web data
- structured data
- information extraction
- expert search
- content and structure
- semi structured data
- data model
- information integration
- html pages
- text mining
- information retrieval
- unstructured text
- structured knowledge
- search interface
- natural language
- information retrieval systems
- data extraction
- web data extraction
- web data sources
- document retrieval
- text documents
- natural language processing
- document collections
- keywords
- wrapper generation
- database systems
- web sources
- relevant documents
- unstructured data
- xml documents
- data sets
- artificial intelligence
- historical documents
- retrieval systems
- metadata
- knowledge rich
- web pages
- real world