Document conversion for cultural heritage texts: FrameMaker to HTML revisited.
Michael PiotrowskiPublished in: ACM Symposium on Document Engineering (2010)
Keyphrases
- cultural heritage
- text documents
- digital libraries
- html documents
- electronic documents
- multimedia
- keywords
- information extraction
- document type
- digital objects
- authorship attribution
- document structure
- knowledge society
- web documents
- document classification
- document images
- web based technologies
- virtual museum
- information retrieval
- cidoc crm
- web pages
- information retrieval systems
- document collections
- retrieval systems
- natural language
- topic models
- relevant documents
- text classification
- text mining
- scientific papers
- html pages
- digital collections
- dublin core
- digital archives
- structured documents
- extensible markup language
- text content
- textual content
- metadata