Login / Signup
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data.
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
Published in:
CoRR (2019)
Keyphrases
</>
high quality
data sets
database
data collection
raw data
data analysis
website
data structure
web applications
web content
web data
web pages
data points
spatial data
information access
data extraction
linked open data
information sources
knowledge discovery
data sources
training data