OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text.
Keiran PasterMarco Dos SantosZhangir AzerbayevJimmy BaPublished in: ICLR (2024)
Keyphrases
- high quality
- web documents
- textual data
- text information
- website
- database
- information retrieval and extraction
- web applications
- web images
- textual features
- text retrieval
- information retrieval
- web scale
- image quality
- web pages
- text content
- web mining
- digital documents
- low quality
- multi lingual
- content features
- web data
- web users
- linked data
- text mining
- digital libraries
- keywords
- web resources
- free text
- synthetic datasets
- web content
- semantic information
- information sources
- web search
- newspaper articles
- high resolution
- mathematical formulas
- textual case based reasoning