Enriching a document collection by integrating information extraction and PDF annotation.
Brett PowleyRobert DaleIlya AnisimoffPublished in: DRR (2009)
Keyphrases
- document collections
- information extraction
- information retrieval
- information retrieval systems
- document retrieval
- text retrieval
- probability density function
- test collection
- text mining
- free text
- digital libraries
- natural language processing
- precision and recall
- relevant documents
- active learning
- document clustering
- named entities
- textual data
- question answering
- metadata
- ad hoc retrieval
- index terms
- semi structured
- cross language
- named entity recognition
- machine learning
- document set
- image annotation
- geographic information retrieval
- query terms
- web mining
- text documents
- structured data
- web documents
- machine translation
- document representation
- relevance feedback
- web pages
- search engine
- database
- result lists
- document archives