PyTabby: a Docreader's module for extracting text and tables from PDF with a text layer (short paper).

Andrey A. Mikhailov Alexey O. Shigarov Ilya S. Kozlov

Published in: ITAMS (2021)

Keyphrases

automatically extracted
database
information retrieval
keywords
free text
text data
text analysis
video sequences
text mining
multi layer
pdf files
automatically extracting
application layer
document analysis
natural language text
latent semantic analysis
textual data
key concepts
text documents
maximum likelihood
search engine
data mining