SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task.
Ziije ZhongLinqing ZhongZhaoze SunQingyun JinZengchang QinXiaofan ZhangPublished in: CoRR (2024)
Keyphrases
- synthetic data
- language model
- fine tuning
- information retrieval
- language modeling
- n gram
- document level
- document retrieval
- probabilistic model
- retrieval model
- text retrieval
- multiword
- query expansion
- context sensitive
- language modelling
- data sets
- real world
- real image data
- speech recognition
- fine tuned
- statistical language models
- language models for information retrieval
- smoothing methods
- test collection
- text mining
- vector space model
- pseudo relevance feedback
- ad hoc information retrieval
- keywords
- query terms
- semantic information
- translation model
- spoken term detection
- okapi bm
- feature selection
- document ranking
- co occurrence
- relevance model
- text documents