Using Apache Spark on Hadoop Clusters as Backend for WebLicht Processing Pipelines.

Soheila Sahami Thomas Eckart Gerhard Heyer

Published in: CLARIN Annual Conference (2018)

Keyphrases

back end
open source
data management
big data
map reduce
user friendly
clustering algorithm
data processing
building blocks
data types
cloud computing
data sets
data repositories
open source software
distributed systems
information management
parallel computation
source code
data structure
open source projects
database
publish subscribe