Is Character Trigram Overlapping Ratio Still the Best Similarity Measure for Aligning Sentences in a Paraphrased Corpus?
Aleksandra SmolkaHsin-Min WangJason S. ChangKeh-Yih SuPublished in: ROCLING (2022)
Keyphrases
- similarity measure
- sentence level
- measuring similarity
- linguistic features
- part of speech
- link grammar
- text corpus
- lexical features
- training corpus
- multiword
- syntactic features
- penn treebank
- semantic roles
- document level
- word sense
- mutual information
- probabilistic context free grammars
- noun phrases
- language model
- tree bank
- image registration
- multi document summarization
- word frequency
- pairwise
- clustering method
- plain text
- n gram
- similarity function
- writing style
- automatic summarization
- pos tagging
- sentiment analysis
- information retrieval
- manually annotated
- distance measure
- linguistic patterns
- natural language
- similarity computation
- word segmentation
- natural language processing
- optical character recognition