spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Spark vs Google cloud dataflow
Date Fri, 27 Jun 2014 12:16:41 GMT
On Thu, Jun 26, 2014 at 9:15 AM, Aureliano Buendia <> wrote:
> Summingbird is for map/reduce. Dataflow is the third generation of google's
> map/reduce, and it generalizes map/reduce the way Spark does. See more about
> this here:

Yes, my point was that Summingbird is similar in that it is a
higher-level service for batch/streaming computation, not that it is
similar for being MapReduce-based.

> It seems Dataflow is based on this paper:

FlumeJava maps to Crunch in the Hadoop ecosystem. I think Dataflows is
more than that but yeah that seems to be some of the 'language'. It is
similar in that it is a distributed collection abstraction.

View raw message