Towards High-Quality Text Stream Extraction from PDF. Technical Background to the ACL 2012 Contributed Task.
Øyvind Raddum BergStephan OepenJonathon ReadPublished in: Discoveries@ACL (2012)
Keyphrases
- high quality
- text extraction
- automatically extracted
- information retrieval
- pdf files
- text retrieval
- database
- sliding window
- automatically extracting
- text mining
- data streams
- probability density function
- printed text
- text information
- keywords
- free text
- higher quality
- higher order statistics
- real time
- automatic extraction
- text data
- mixture model
- image quality
- information extraction
- foreground objects
- text documents
- text classification
- entity extraction
- digital libraries
- natural image matting