Business Insight from Collection of Unstructured Formatted Documents with IBM Content Harvester.
Biplav SrivastavaYuan-Chi ChangPublished in: COMAD (2009)
Keyphrases
- document collections
- pdf files
- related documents
- web documents
- structured data
- metadata
- unstructured data
- effective retrieval
- semi structured
- digital collections
- unstructured information
- unstructured documents
- textual content
- document content
- semi structured data
- meta information
- text collections
- content and structure
- xml documents
- time stamped
- information retrieval
- information retrieval systems
- document retrieval
- distributed information retrieval
- relevant documents
- digital libraries
- semantic content
- textual data
- automatic categorization
- text content
- relevant content
- database
- information systems
- textual information
- multimedia documents
- structured documents
- document repositories
- information extraction
- business processes
- semantic information
- test collection
- retrieval systems
- digital objects
- unstructured text
- text documents
- logical structure
- multimedia data
- document clustering
- business intelligence
- electronic documents
- web content
- digital content
- document set
- information technology infrastructure
- text mining
- relational databases
- keywords
- multimedia
- data mining