Mining Text Outliers in Document Directories.
Edouard FouchéYu MengFang GuoHonglei ZhuangKlemens BöhmJiawei HanPublished in: ICDM (2020)
Keyphrases
- text mining
- text documents
- document analysis
- keywords
- web documents
- digital documents
- document processing
- information retrieval
- textual content
- document content
- textual documents
- text clustering
- document classification
- text collections
- text content
- multimedia documents
- semantic information
- document images
- web pages
- document clustering
- data mining
- printed documents
- document structure
- scientific documents
- pdf files
- outlier mining
- retrieval engine
- page layout analysis
- scanned documents
- structured documents
- document set
- retrieval systems
- database
- document corpus
- automatic text summarization
- electronic documents
- document categorization
- document level
- web mining
- text summarization
- free text
- information retrieval systems
- scientific papers
- text retrieval
- text corpus
- text classification
- latent semantic analysis
- extractive summarization
- outlier detection
- keyword extraction
- knowledge discovery
- document representation
- text data
- sequential patterns
- search engine
- data points
- relevant documents
- natural language processing
- document retrieval
- language model
- topic models