Semantic Document Selection - Historical Research on Collections That Span Multiple Centuries.
Daan OdijkOrk de RooijMaria-Hendrike PeetzToine PietersMaarten de RijkeStephen SneldersPublished in: TPDL (2012)
Keyphrases
- document collections
- information retrieval
- semantic information
- text collections
- information retrieval systems
- automatic text classification
- document content
- semantic structure
- historical data
- semantically related
- document archives
- document retrieval
- document classification
- metadata
- high level
- vector space model
- digital libraries
- semantic annotation
- semantic similarity
- domain ontology
- semantic web
- data sets
- historical documents
- document centric
- relevant documents
- document space
- natural language
- semantic content
- test collection
- domain specific
- effective retrieval
- tf idf
- similar documents
- text representation
- structured documents
- document representation
- selection algorithm
- document clustering
- document images
- web documents