Reversing Controlled Document Authoring to Normalize Documents.
Aurélien MaxPublished in: EACL (2003)
Keyphrases
- document collections
- text documents
- document clustering
- relevant documents
- document classification
- electronic documents
- document content
- web documents
- information retrieval
- document representation
- semi structured documents
- retrieval systems
- document analysis
- information retrieval systems
- digital documents
- document processing
- similar documents
- retrieved documents
- document retrieval
- structured documents
- document ranking
- vector space model
- document type
- document similarity
- keywords
- document set
- document archives
- document summarization
- multimedia documents
- document repository
- document structure
- textual documents
- term frequency
- document level
- digital libraries
- related documents
- user queries
- index terms
- text mining
- scientific documents
- document images
- test collection
- latent topics
- textual content
- document centric
- document relevance
- text classifiers
- printed documents
- document space
- unstructured documents
- training documents
- text collections
- text categorization
- query terms
- pdf files
- maximal marginal relevance
- scanned documents
- semantic information
- latent semantic analysis
- ranked list
- query specific
- learning objects
- topic models
- xml documents
- vector space
- tf idf
- metadata
- pdf documents
- relevant content
- retrieval strategies
- xml format
- cross references
- information extraction
- web pages
- query expansion
- document corpus
- keyword extraction
- text summarization
- multi document summarization
- handwritten documents