A Large-scale Dataset of (Open Source) License Text Variants.

Stefano Zacchiroli

Published in: CoRR (2022)

Keyphrases

open source
source code
open source software
software package
case study
real life
text mining
information retrieval
text retrieval
natural language generation
database
synthetic datasets
small scale
open source projects
key concepts
benchmark datasets
string matching
real world
automatically extracted
data analytics
million images
textual data
training dataset
text data
free text
text classification
machine learning
data sets