spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bach Bui <free...@adatao.com>
Subject Re: Spark vs Google cloud dataflow
Date Thu, 26 Jun 2014 18:26:25 GMT
"The current problem with Spark is the big overhead and cost of bringing up
a cluster. On a good day, it takes AWS spot instances 15 - 20 minutes to
bring up a 30 node cluster. This makes it non-efficient for computations
which may take only 10 - 15 minutes."

Hmm, this is a misleading message.The overhead of bringing up a AWS Spark
spot instances is NOT the inherent problem of Spark.
If you have a cluster that is already running, a Spark job can be started
within ~100ms.

Best,


On Thu, Jun 26, 2014 at 7:15 AM, Aureliano Buendia <buendia360@gmail.com>
wrote:

>
>
>
> On Thu, Jun 26, 2014 at 10:58 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> My first reaction was that Dataflow mapped more to Summingbird, as part
>>
>
> Summingbird is for map/reduce. Dataflow is the third generation of
> google's map/reduce, and it generalizes map/reduce the way Spark does. See
> more about this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s
>
> It seems Dataflow is based on this paper:
> http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf
>
> The paper mentions a few times in-memory computation. But I'm not sure how
> much Google's implementation resembles to Spark when it comes to in-memory
> computation.
>
> The current problem with Spark is the big overhead and cost of bringing up
> a cluster. On a good day, it takes AWS spot instances 15 - 20 minutes to
> bring up a 30 node cluster. This makes it non-efficient for computations
> which may take only 10 - 15 minutes.
>
>
>> of it is a higher-level system for doing a specific thing in
>> batch/streaming -- aggregations.
>>
>> On Wed, Jun 25, 2014 at 8:23 PM, Aureliano Buendia <buendia360@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Today Google announced their cloud dataflow, which is very similar to
>> spark
>> > in performing batch processing and stream processing.
>> >
>> > How does spark compare to Google cloud dataflow? Are they solutions
>> trying
>> > to aim the same problem?
>> >
>> >
>>
>
>


-- 

Michael B. Bui, PhD,
Senior Software Architect, ADATAO Inc.
www.adatao.com

Mime
View raw message