spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Goodson <mar...@skimlinks.com>
Subject Re: Spark vs Google cloud dataflow
Date Fri, 27 Jun 2014 12:10:12 GMT
My experience is that gaining 20 spot instances accounts for a tiny
fraction of the total time of provisioning a cluster with spark-ec2. This
is not (solely) an AWS issue.


-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Thu, Jun 26, 2014 at 10:14 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Hmm, I remember a discussion on here about how the way in which spark-ec2
> rsyncs stuff to the cluster for setup could be improved, and I’m assuming
> there are other such improvements to be made. Perhaps those improvements
> don’t matter much when compared to EC2 instance launch times, but I’m not
> sure.
> ​
>
>
> On Thu, Jun 26, 2014 at 4:48 PM, Aureliano Buendia <buendia360@gmail.com>
> wrote:
>
>>
>>
>>
>> On Thu, Jun 26, 2014 at 9:42 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>>
>>> That’s technically true, but I’d be surprised if there wasn’t a lot of
>>> room for improvement in spark-ec2 regarding cluster launch+config
>>> times.
>>>
>> Unfortunately, this is a spark support issue, but an AWS one. Starting a
>> few months ago, Amazon AWS services have been having bigger and bigger
>> lags. Indeed, the default timeout hard coded  in spark-ec2 is no longer
>> able to launch the cluster successfully, and many people here reported that
>> they had to increase it.
>>
>>
>> ​
>>>
>>
>>
>

Mime
View raw message