Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents.
Michael GüntherJackmin OngIsabelle MohrAlaeddine AbdessalemTanguy AbelMohammad Kalim AkramSusana GuzmanGeorgios MastrapasSaba SturuaBo WangMaximilian WerkNan WangHan XiaoPublished in: CoRR (2023)
Keyphrases
- general purpose
- vector space
- text documents
- information retrieval
- free text
- text analysis
- digital documents
- textual content
- document analysis
- web documents
- dimensionality reduction
- text content
- text data
- keywords
- newspaper articles
- text information
- text clustering
- text corpus
- document content
- low dimensional
- printed documents
- textual documents
- plagiarism detection
- text retrieval
- text mining
- document processing
- textual information
- textual data
- text collections
- information retrieval systems
- natural language text
- document collections
- topic segmentation
- information extraction
- latent semantic analysis
- document categorization
- electronic documents
- document retrieval
- automatic categorization
- page layout
- key concepts
- vector space model
- relevant documents
- metadata
- document level
- digital libraries
- high dimensional
- distance measure
- text classification
- topic models
- retrieval engine
- automatic summarization
- semantic information
- document representation
- application specific
- document structure
- news stories
- multimedia documents
- scientific literature