Text and Code Embeddings by Contrastive Pre-Training.
Arvind NeelakantanTao XuRaul PuriAlec RadfordJesse Michael HanJerry TworekQiming YuanNikolas TezakJong Wook KimChris HallacyJohannes HeideckePranav ShyamBoris PowerTyna Eloundou NekoulGirish SastryGretchen KruegerDavid SchnurrFelipe Petroski SuchKenny HsuMadeleine ThompsonTabarak KhanToki SherbakovJoanne JangPeter WelinderLilian WengPublished in: CoRR (2022)
Keyphrases
- training set
- text retrieval
- supervised learning
- source code
- free text
- keywords
- training examples
- training phase
- distance measure
- training samples
- test set
- text documents
- plain text
- database
- text information
- text data
- semantic information
- text categorization
- low dimensional
- text mining
- digital libraries
- similarity measure
- information retrieval