SciRepEval: A Multi-Format Benchmark for Scientific Document Representations.
Amanpreet SinghMike D'ArcyArman CohanDoug DowneySergey FeldmanPublished in: EMNLP (2023)
Keyphrases
- document representation
- document clustering
- document collections
- bag of words
- semantically enhanced
- vector space
- vector representation
- language model
- text documents
- multimedia
- metadata
- multiscale
- vector space model
- web documents
- databases
- data fusion
- similarity search
- artificial intelligence
- text data
- similarity measure
- information retrieval
- image features
- machine learning
- prior knowledge
- data mining
- high dimensional