PyTabby: a Docreader's module for extracting text and tables from PDF with a text layer (short paper).
Andrey A. MikhailovAlexey O. ShigarovIlya S. KozlovPublished in: ITAMS (2021)
Keyphrases
- automatically extracted
- database
- information retrieval
- keywords
- free text
- text data
- text analysis
- video sequences
- text mining
- multi layer
- pdf files
- automatically extracting
- application layer
- document analysis
- natural language text
- latent semantic analysis
- textual data
- key concepts
- text documents
- maximum likelihood
- search engine
- data mining