Handling Artifacts in Digitally Reproduced Documents.
Luigi CinqueStefano LevialdiLuca LombardiSteven L. TanimotoPublished in: CAMP (2000)
Keyphrases
- information retrieval
- document collections
- document classification
- text documents
- relevant documents
- legal documents
- document clustering
- information retrieval systems
- web documents
- keywords
- document representation
- retrieval systems
- xml documents
- document retrieval
- high quality
- free text
- data sets
- vector space model
- text analysis
- multimedia documents
- database
- electronic documents
- textual content
- latent semantic indexing
- user queries
- information extraction