Less Destructive Cleaning of Web Documents by Using Standoff Annotation.
Maik StührenbergPublished in: WaC@EACL (2014)
Keyphrases
- web documents
- social annotations
- information extraction
- document classification
- semi structured
- web search engines
- web pages
- keywords
- metadata
- textual information
- document representation
- web content
- html documents
- vector space model
- link structure
- image annotation
- active learning
- data extraction
- image retrieval
- web data
- database
- document clustering
- structured documents
- data mining