Login / Signup
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data.
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
Published in:
LREC (2020)
Keyphrases
</>
high quality
data sets
raw data
database
data analysis
data collection
data structure
end users
data extraction
information sources
search engine
web data
website
web pages
web crawling
data repositories
deep web
log files
web documents
data mining techniques
data points
data sources