spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Spark vs Google cloud dataflow
Date Fri, 27 Jun 2014 12:16:41 GMT
On Thu, Jun 26, 2014 at 9:15 AM, Aureliano Buendia <buendia360@gmail.com> wrote:
> Summingbird is for map/reduce. Dataflow is the third generation of google's
> map/reduce, and it generalizes map/reduce the way Spark does. See more about
> this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s

Yes, my point was that Summingbird is similar in that it is a
higher-level service for batch/streaming computation, not that it is
similar for being MapReduce-based.

> It seems Dataflow is based on this paper:
> http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf

FlumeJava maps to Crunch in the Hadoop ecosystem. I think Dataflows is
more than that but yeah that seems to be some of the 'language'. It is
similar in that it is a distributed collection abstraction.

Mime
View raw message