Continuous word embeddings for detecting local text reuses at the semantic level.
Qi ZhangJihua KangJin QianXuanjing HuangPublished in: SIGIR (2014)
Keyphrases
- semantic level
- semantic similarity
- context dependent
- word pairs
- keywords
- sentence similarity
- text corpus
- string matching
- sentence level
- text input
- natural language text
- printed text
- english words
- lexical features
- related words
- co occurrence
- word counts
- english text
- text retrieval
- chinese text
- word level
- syntactic categories
- word sense
- linguistic information
- word sense disambiguation
- vector space
- lexical information
- syntactic analysis
- printed documents
- multiword
- text segments
- n gram
- database
- unknown words
- stop words
- page layout
- punctuation marks
- machine translation system
- search engine
- text mining
- low dimensional
- document images
- noun phrases
- text corpora
- concept space
- document analysis
- word segmentation
- character recognition
- text documents
- word recognition
- semantic information
- handwriting recognition
- spoken documents
- handwritten words
- named entity recognizer