spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <>
Subject Re: What benefits do we really get out of colocation?
Date Sat, 03 Dec 2016 17:42:15 GMT
ephemeral storage on ssd will be very painful to maintain especially with
large datasets. we will pretty soon have somewhere in PB.

I am thinking to leverage something like below. But not sure how much
performance gain we could get out of that.

On Sat, Dec 3, 2016 at 8:28 AM, vincent gromakowski <> wrote:

> What about ephemeral storage on ssd ? If performance is required it's
> generally for production so the cluster would never be stopped. Then a
> spark job to backup/restore on S3 allows to shut down completely the cluster
> Le 3 déc. 2016 1:28 PM, "David Mitchell" <> a
> écrit :
>> To get a node local read from Spark to Cassandra, one has to use a read
>> consistency level of LOCAL_ONE.  For some use cases, this is not an
>> option.  For example, if you need to use a read consistency level
>> of LOCAL_QUORUM, as many use cases demand, then one is not going to get a
>> node local read.
>> Also, to insure a node local read, one has to set spark.locality.wait to
>> zero.  Whether or not a partition will be streamed to another node or
>> computed locally is dependent on the spark.locality.wait parameters. This
>> parameter can be set to 0 to force all partitions to only be computed on
>> local nodes.
>> If you do some testing, please post your performance numbers.

View raw message