esCorpius: A Massive Spanish Crawling Corpus.
Asier Gutiérrez-FandiñoDavid Pérez FernándezJordi Armengol-EstapéDavid GriolZoraida CallejasPublished in: IberSPEECH (2022)
Keyphrases
- spanish language
- search engine
- web mining
- data analysis
- web pages
- web crawling
- massive data
- language identification
- coreference resolution
- machine learning
- supervised machine learning
- machine translation system
- test set
- information retrieval systems
- data sets
- manually annotated
- noun phrases
- focused crawling
- open domain
- spoken dialog