Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance.
Christian BuckPhilipp KoehnPublished in: WMT (2016)
Keyphrases
- tf idf
- cosine distance
- weighting schemes
- term frequency
- information retrieval
- document clustering
- vector space model
- text documents
- weighting scheme
- text categorization
- retrieval model
- term weighting
- ranking algorithm
- text mining
- data mining
- retrieval systems
- semantic similarity
- semantic information
- text classification
- co occurrence
- information extraction
- machine learning