Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages.
Ritesh SarkhelBinxuan HuangColin LockardPrashant ShiralkarPublished in: Proc. VLDB Endow. (2023)
Keyphrases
- semi structured
- information extraction
- data extraction
- web documents
- structured data
- web information extraction
- web pages
- web data
- free text
- web data sources
- text mining
- data model
- web data extraction
- information integration
- natural language processing
- wrapper generation
- web sources
- data collections
- html pages
- semi structured data
- xml databases
- semi structured documents
- structured knowledge
- search engine
- website
- natural language
- data analysis
- text documents
- web search
- web content
- real world
- information retrieval