OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text.
Keiran PasterMarco Dos SantosZhangir AzerbayevJimmy BaPublished in: CoRR (2023)
Keyphrases
- high quality
- web documents
- website
- textual data
- text information
- information retrieval and extraction
- web applications
- web pages
- database
- text mining
- semantic web
- textual features
- ground truth
- web images
- information retrieval
- keywords
- newspaper articles
- multi lingual
- linked data
- text retrieval
- web data
- anchor text
- high resolution
- digital documents
- text content
- low quality
- web resources
- web users
- text documents
- key concepts
- link analysis
- free text
- page layout