A web-based document harmonization and annotation chain: from PDF to RDF.
Thierry JacquinOlivier FambonBoris ChidlovskiiPublished in: ACM Symposium on Document Engineering (2005)
Keyphrases
- pdf files
- metadata
- xml format
- pdf documents
- document images
- document retrieval
- retrieval systems
- social annotations
- probability density function
- semantic web
- semantic annotation
- database
- knowledge base
- linked data
- relational data
- document collections
- effective retrieval
- information retrieval
- document classification
- markup language
- automatic annotation
- databases
- active learning
- query language
- text documents
- structured documents
- keywords
- image retrieval
- scientific documents
- probability distribution function
- data model
- semantic labels
- probability distribution
- information extraction
- information retrieval systems
- image annotation
- wordnet
- mixture model
- semantic information
- document clustering
- relevant documents