Sign in

On Inter-dataset Code Duplication and Data Leakage in Large Language Models.

José Antonio Hernández LópezBoqi ChenTushar SharmaDániel Varró
Published in: CoRR (2024)
Keyphrases
  • language model
  • training data
  • context sensitive
  • active learning
  • test collection
  • language modeling