Label-Efficient Self-Training for Attribute Extraction from Semi-Structured Web Documents.
Ritesh SarkhelBinxuan HuangColin LockardPrashant ShiralkarPublished in: CoRR (2022)
Keyphrases
- semi structured
- web documents
- information extraction
- data extraction
- structured data
- web data extraction
- information integration
- web pages
- web information extraction
- tree structured patterns
- web search engines
- semi structured data
- wrapper generation
- html documents
- keywords
- web data
- web data sources
- semistructured data
- text mining
- data model
- xml databases
- wrapper induction
- data mining
- semistructured documents
- structured knowledge
- web sources
- automatic extraction
- natural language
- textual information
- web content