CDA: a Cost Efficient Content-based Multilingual Web Document Aligner.
Thuy VuAlessandro MoschittiPublished in: EACL (2021)
Keyphrases
- cost efficient
- web documents
- content similarity
- image retrieval
- information extraction
- semi structured
- multimedia
- web search engines
- web data
- language independent
- digital libraries
- prefetching
- web content
- web pages
- keywords
- textual information
- html documents
- cross language
- governmental organizations
- focused crawling
- cross lingual
- vector space model
- text documents
- dynamically generated
- security model
- double auction
- web logs
- website
- relevance feedback