CDA: a Cost Efficient Content-based Multilingual Web Document Aligner.
Thuy VuAlessandro MoschittiPublished in: CoRR (2021)
Keyphrases
- cost efficient
- web documents
- content similarity
- image retrieval
- information extraction
- web search engines
- digital libraries
- semi structured
- web pages
- keywords
- html documents
- prefetching
- vector space model
- multimedia
- web data
- textual information
- web logs
- governmental organizations
- n gram
- document representation
- user interaction
- relevance feedback
- unstructured documents