esCorpius: A Massive Spanish Crawling Corpus.
Asier Gutiérrez-FandiñoDavid Pérez FernándezJordi Armengol-EstapéDavid GriolZoraida CallejasPublished in: CoRR (2022)
Keyphrases
- spanish language
- search engine
- web pages
- language identification
- machine translation system
- massive data
- question answering
- resource discovery
- supervised machine learning
- data analysis
- machine learning
- web crawling
- web crawlers
- test set
- web applications
- neural network
- text corpus
- focused crawling
- open domain
- data mining