BoB, a best-of-breed automated text de-identification system for VHA clinical documents.
Óscar FerrándezBrett R. SouthShuying ShenF. Jeffrey FriedlinMatthew H. SamoreStéphane M. MeystrePublished in: J. Am. Medical Informatics Assoc. (2013)
Keyphrases
- text documents
- free text
- information retrieval
- digital documents
- keywords
- document analysis
- text analysis
- text retrieval
- latent semantic analysis
- plagiarism detection
- patient records
- textual data
- newspaper articles
- web documents
- textual content
- text data
- document categorization
- text information
- document content
- text collections
- document processing
- text content
- text clustering
- medical records
- electronic documents
- document structure
- natural language text
- textual documents
- information retrieval systems
- multimedia documents
- page layout
- printed documents
- document clustering
- automatic categorization
- text mining
- text segments
- document collections
- information extraction
- linguistic analysis
- metadata
- semantic information
- textual information
- journal articles
- key concepts
- topic segmentation
- document set
- document level
- relevant documents
- retrieval systems
- document retrieval
- text lines
- text corpus
- scientific literature
- text classification
- extractive summarization
- patient data
- semantic content
- structured documents
- handwritten text
- sentence level
- multiword
- multi document summarization
- vector space model
- document repositories
- scanned documents
- retrieval engine
- automatic summarization
- handwritten documents
- news stories
- structured data
- topic models
- text categorization
- xml documents
- digital libraries
- search engine