A Large-scale Dataset of (Open Source) License Text Variants.

Stefano Zacchiroli

Published in: MSR (2022)

Keyphrases

open source
open source software
source code
software package
case study
text mining
benchmark datasets
real world
real life
web documents
database
natural language generation
automatically extracted
video collections
small scale
text retrieval
text documents
keywords
information retrieval
semantic markup
million images
data analytics
legacy software systems
textual data
training dataset
synthetic datasets
key concepts
text data