Clustering Document Parts: Detecting and Characterizing Influence Campaigns From Documents.
Zhengxiang WangOwen RambowPublished in: CoRR (2024)
Keyphrases
- document clustering
- document collections
- text clustering
- document clusters
- text documents
- document representation
- web documents
- cosine similarity
- information retrieval systems
- clustering algorithm
- relevant documents
- document similarity
- information retrieval
- k means
- clustering method
- vector space model
- document classification
- tolerance rough set
- document retrieval
- digital documents
- electronic documents
- structured documents
- topic discovery
- retrieval systems
- similar documents
- document processing
- retrieved documents
- content similarity
- document type
- semi structured documents
- document content
- document set
- text mining
- index terms
- document space
- keywords
- digital libraries
- latent semantic analysis
- tf idf
- document archives
- document corpus
- automatic categorization
- related documents
- xml format
- document repository
- document ranking
- user queries
- document structure
- document analysis
- document summarization
- text collections
- query terms
- test collection
- term frequency
- training documents
- latent topics
- textual content
- document level
- query expansion
- multimedia documents
- inverted index
- retrieval strategies
- text categorization
- ranked list
- scientific documents
- topic hierarchy
- bag of words
- language model