Nomic Embed: Training a Reproducible Long Context Text Embedder.
Zach NussbaumJohn X. MorrisBrandon DuderstadtAndriy MulyarPublished in: CoRR (2024)
Keyphrases
- text retrieval
- training set
- contextual information
- training process
- textual data
- training examples
- pattern matching
- named entity disambiguation
- text information
- document analysis
- training algorithm
- free text
- context sensitive
- database
- test set
- web documents
- training samples
- context aware
- support vector machine
- website