Identifying Semantically Deviating Outlier Documents.
Honglei ZhuangChi WangFangbo TaoLance M. KaplanJiawei HanPublished in: EMNLP (2017)
Keyphrases
- information retrieval
- document collections
- semantic information
- text documents
- legal documents
- document retrieval
- web documents
- xml documents
- outlier detection
- metadata
- digital documents
- semantic content
- retrieval systems
- information retrieval systems
- relevant documents
- natural language
- semantically relevant
- information extraction
- document classification
- document representation
- time stamped
- plagiarism detection
- electronic documents
- document clustering
- web data
- digital libraries
- document analysis
- similarity measure