JURD: Joiner of Un-Readable Documents to reverse tokenization attacks to content-based spam filters.
Igor SantosCarlos LaordenBorja SanzPablo García BringasPublished in: CCNC (2013)
Keyphrases
- spam filters
- spam filtering
- information retrieval
- text documents
- document collections
- anti spam
- spam emails
- named entities
- information retrieval systems
- text classification
- document retrieval
- metadata
- image retrieval
- machine learning methods
- xml documents
- web documents
- content similarity
- email spam
- relevant documents
- keywords
- document clustering
- image spam
- document representation
- watermarking scheme
- multimedia
- feature selection
- machine learning
- vector space model
- query terms
- web search
- biomedical text
- neural network