A Large-scale Dataset of (Open Source) License Text Variants.
Stefano ZacchiroliPublished in: MSR (2022)
Keyphrases
- open source
- open source software
- source code
- software package
- case study
- text mining
- benchmark datasets
- real world
- real life
- web documents
- database
- natural language generation
- automatically extracted
- video collections
- small scale
- text retrieval
- text documents
- keywords
- information retrieval
- semantic markup
- million images
- data analytics
- legacy software systems
- textual data
- training dataset
- synthetic datasets
- key concepts
- text data