More than Words is What you Need - Detecting DGA and Phishing Domains with Dom2Vec Word Embeddings.
Lucas Torrealba AravenaPedro CasasJavier Bustos-JiménezMislav FindrikPublished in: TMA (2024)
Keyphrases
- n gram
- related words
- english words
- word sense disambiguation
- word recognition
- word pairs
- word frequencies
- unknown words
- word meaning
- word segmentation
- keywords
- word similarity
- text corpus
- query words
- multiword
- lexical information
- word spotting
- linguistic information
- stop words
- word level
- syntactic categories
- handwritten words
- spoken document retrieval
- out of vocabulary
- website
- training corpus
- word sense
- countermeasures
- chinese word segmentation
- lexical features
- language model
- numeral strings
- word meanings
- malicious activities
- noun phrases
- word frequency
- language specific
- natural language text
- short list
- vector space
- frequency counts
- linguistic knowledge
- compound words
- speech recognition systems
- translation model
- chinese text
- xml documents
- co occurrence
- low dimensional
- word co occurrence
- printed text
- information retrieval
- cross language information retrieval
- cross domain
- parallel corpus
- online resources
- handwriting recognition
- arabic documents
- automatic transcription