Large-scale text processing pipeline with Apache Spark.

Alexey Svyatkovskiy Kosuke Imai Mary Kroeger Yuki Shiraito

Published in: IEEE BigData (2016)

Keyphrases

processing pipeline
open source
text retrieval
text mining
open source software
real world
database
information retrieval
free text
real life
text information
latent semantic analysis
information extraction
web server
small scale
text data
text documents
key concepts
textual information
textual data
text analysis
structured data
keywords