spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashic Mahtab <as...@live.com>
Subject RE: Full per node replication level (architecture question)
Date Sat, 24 Jan 2015 22:50:27 GMT
You could look at using Cassandra for storage. Spark integrates nicely with Cassandra, and
a combination of Spark + Cassandra would give you fast access to structured data in Cassandra,
while enabling analytic scenarios via Spark. Cassandra would take care of the replication,
as it's one of the core features of the database.

Date: Sat, 24 Jan 2015 23:34:15 +0200
Subject: Full per node replication level (architecture question)
From: dev.matan@gmail.com
To: user@spark.incubator.apache.org

Hi,
I wonder whether any of the file systems supported by Spark, may well support a replication
level whereby each node has a full copy of the data. I realize this was not the main intended
scenario of spark/hadoop, but may be a good fit for a compute cluster that needs to be very
fast over its input data, and that has data only in the amount of few terabytes in total (which
fit nicely on any commodity disk and soon on any SSD).
It would be nice to use Spark map-reduce over the data, and enjoy automatic replication.
It would be also nice to assume Spark can seamlessly manage a job's workflow across such cluster...
Thanks!Matan 		 	   		  
Mime
View raw message