Ethically Archiving a Hard-to-Access Massive Research Data Set in the Language Bank of Finland: The Finnish Dark Web Marketplace Corpus (FINDarC).
Krister LindénTeemu RuokolainenLasse HämäläinenJ. Tuomas HarviainenPublished in: Tethics (2023)
Keyphrases
- data sets
- massive data sets
- natural language
- spanish language
- language learning
- data collection
- programming language
- parallel corpus
- database
- data analysis
- web mining
- training data
- manually annotated
- real world
- data sources
- object oriented
- access control
- query expansion
- link analysis
- intelligence and security informatics
- spoken dialog