Proving membership in LLM pretraining data via data watermarks.
Johnny Tian-Zheng WeiRyan Yixiang WangRobin JiaPublished in: CoRR (2024)
Keyphrases
- data sets
- data analysis
- raw data
- training data
- data structure
- statistical analysis
- data quality
- original data
- high quality
- data sources
- synthetic data
- database
- data collection
- knowledge discovery
- experimental data
- small number
- data points
- data processing
- input data
- training set
- association rules
- feature selection
- information systems
- complex data