DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond.
Cong YaoPublished in: CoRR (2023)
Keyphrases
- open source
- open source software
- information retrieval systems
- source code
- information retrieval
- document collections
- document images
- retrieval systems
- text documents
- natural language processing
- syntactic analysis
- natural language
- document retrieval
- document clustering
- case study
- document classification
- relevant documents
- digital documents
- database
- keywords
- high level
- map reduce
- vector space model
- semantic information
- user friendly
- user queries
- unsupervised learning
- digital libraries
- real world