PLAtE: A Large-scale Dataset for List Page Web Extraction.
Aidan SanYuan ZhuangJan BakusColin LockardDavid M. CiemiewiczSandeep AtluriKevin SmallYangfeng JiHeba ElfardyPublished in: ACL (industry) (2023)
Keyphrases
- website
- web pages
- web scale
- web information extraction
- page content
- data extraction
- web documents
- chinese web
- web applications
- news pages
- information extraction
- million images
- web browsing
- web mining
- database
- search engine
- synthetic datasets
- google search
- automatic extraction
- web graph
- linked data
- real world
- web users
- anchor text
- benchmark datasets
- semantic web
- web snippets
- keywords
- content features
- feature set
- text mining
- page layout
- page importance
- end users