Document embeddings learned on various types of n-grams for cross-topic authorship attribution.
Helena Gómez-AdornoJuan Pablo Posadas-DuránGrigori SidorovDavid PintoPublished in: Computing (2018)
Keyphrases
- n gram
- authorship attribution
- writing style
- bag of words
- word level
- language model
- text classification
- language independent
- web documents
- plagiarism detection
- digital forensics
- language modeling
- part of speech
- source code
- topic models
- information retrieval
- document representation
- document retrieval
- document collections
- document images
- document analysis
- document level
- relevance ranking
- query expansion
- scientific papers
- feature space