PLAtE: A Large-scale Dataset for List Page Web Extraction.
Aidan SanJan BakusColin LockardDavid M. CiemiewiczYangfeng JiSandeep AtluriKevin SmallHeba ElfardyPublished in: CoRR (2022)
Keyphrases
- website
- web pages
- web information extraction
- chinese web
- web scale
- data extraction
- web browsing
- web content
- web applications
- home page
- web documents
- news pages
- million images
- keywords
- page content
- linked data
- web snippets
- google search
- real world
- database
- semantic web
- web users
- web mining
- massive scale
- page layout
- information sources
- web graph
- web data
- link structure
- web news
- content features
- hyperlink structure
- web communities
- web technologies
- user generated content
- web search
- information extraction
- real life
- web log mining