spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Re: What benefits do we really get out of colocation?
Date Sat, 03 Dec 2016 08:42:08 GMT
wait, how is that a benefit? isn't that a bad thing if you are saying
colocating leads to more latency  and overall execution time is longer?

On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> You get more latency on reads so overall execution time is longer
>
> Le 3 déc. 2016 7:39 AM, "kant kodali" <kanth909@gmail.com> a écrit :
>
>>
>> I wonder what benefits do I really I get If I colocate my spark worker
>> process and Cassandra server process on each node?
>>
>> I understand the concept of moving compute towards the data instead of
>> moving data towards computation but It sounds more like one is trying to
>> optimize for network latency.
>>
>> Majority of my nodes (m4.xlarge)  have 1Gbps = 125MB/s (Megabytes per
>> second) Network throughput.
>>
>> and the DISK throughput for m4.xlarge is 93.75 MB/s (link below)
>>
>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
>>
>> so In this case I don't see how colocation can help even if there is one
>> to one mapping from spark worker node to a colocated Cassandra node where
>> say we are doing a table scan of billion rows ?
>>
>> Thanks!
>>
>>

Mime
View raw message