AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning.
Praneeth VadlapatiPublished in: CoRR (2024)
Keyphrases
- web data
- fine tuning
- web mining
- fine tuned
- viable alternative
- semi structured
- web usage mining
- fine tune
- web pages
- web documents
- web content
- incremental mining
- web sources
- web information
- deep web
- database
- web users
- query logs
- social network analysis
- information extraction
- multimedia
- search engine
- machine learning
- data sets