Large-scale text processing pipeline with Apache Spark.
Alexey SvyatkovskiyKosuke ImaiMary KroegerYuki ShiraitoPublished in: IEEE BigData (2016)
Keyphrases
- processing pipeline
- open source
- text retrieval
- text mining
- open source software
- real world
- database
- information retrieval
- free text
- real life
- text information
- latent semantic analysis
- information extraction
- web server
- small scale
- text data
- text documents
- key concepts
- textual information
- textual data
- text analysis
- structured data
- keywords