The Stack: 3 TB of permissively licensed source code.
Denis KocetkovRaymond LiLoubna Ben AllalJia LiChenghao MouYacine JerniteMargaret MitchellCarlos Muñoz FerrandisSean HughesThomas WolfDzmitry BahdanauLeandro von WerraHarm de VriesPublished in: Trans. Mach. Learn. Res. (2023)
Keyphrases
- source code
- software systems
- open source
- open source software
- static analysis
- plagiarism detection
- software projects
- software maintenance
- change impact analysis
- open source projects
- text files
- legacy systems
- execution traces
- version control
- source files
- code examples
- software repositories
- object oriented systems
- authorship attribution
- mailing lists
- mining software repositories
- code reuse