spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matan Safriel <>
Subject Full per node replication level (architecture question)
Date Sat, 24 Jan 2015 21:34:15 GMT

I wonder whether any of the file systems supported by Spark, may well
support a replication level whereby each node has a full copy of the data.
I realize this was not the main intended scenario of spark/hadoop, but may
be a good fit for a compute cluster that needs to be very fast over its
input data, and that has data only in the amount of few terabytes in total
(which fit nicely on any commodity disk and soon on any SSD).

It would be nice to use Spark map-reduce over the data, and enjoy automatic

It would be also nice to assume Spark can seamlessly manage a job's
workflow across such cluster...


View raw message