Steps in building a transcription technology: deciphering the content of historical romanian documents.
Petru RebejaEduard ComanClaudiu MarinescuDan CristeaPublished in: VIPERC (2023)
Keyphrases
- web documents
- metadata
- textual content
- content management
- multimedia documents
- document content
- information retrieval
- semantic content
- document collections
- semantic relevance
- xml documents
- information retrieval systems
- semantic tags
- historical documents
- semantic information
- content and structure
- multimedia
- relevant content
- pdf files
- web pages
- case study
- keywords
- semi structured documents
- document structure
- user interests
- historical manuscripts
- text content
- logical structure
- user queries
- relevant documents
- historical data
- document clustering
- structured documents
- document analysis
- language model
- e learning
- multimedia content
- document type
- vector space model
- digital libraries
- document representation
- word spotting
- web crawler
- co occurrence
- electronic documents
- online resources
- digital objects
- optical character recognition
- multimedia data
- text documents
- handwriting recognition