WebDP: Understanding Discourse Structures in Semi-Structured Web Documents.
Peilin LiuHongyu LinMeng LiaoHao XiangXianpei HanLe SunPublished in: ACL (Findings) (2023)
Keyphrases
- semi structured
- web documents
- information extraction
- information integration
- tree structured patterns
- web data
- web pages
- semi structured data
- keywords
- data extraction
- semistructured data
- html documents
- wrapper generation
- unstructured text
- xml databases
- web content
- structured knowledge
- web search engines
- structural features
- web data sources
- database systems
- web sources
- structured data
- structured documents
- unstructured data
- semistructured documents
- document representation
- textual information
- machine learning
- relational databases
- natural language
- website
- information retrieval