Login / Signup
Identifying Parallel Web Documents by Filenames.
Jisong Chen
Chung-Hsing Yeh
Rowena Chau
Published in:
APWeb (2004)
Keyphrases
</>
web documents
information extraction
web pages
web search engines
semi structured
web content
keywords
document classification
document representation
textual information
html documents
web logs
active learning
link structure
focused crawling
database
vector space model
data model
learning algorithm