spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <deanwamp...@gmail.com>
Subject Re: Spark vs Google cloud dataflow
Date Fri, 27 Jun 2014 12:40:42 GMT
... and to be clear on the point, Summingbird is not limited to MapReduce.
It abstracts over Scalding (which abstracts over Cascading, which is being
moved from MR to Spark) and over Storm for event processing.


On Fri, Jun 27, 2014 at 7:16 AM, Sean Owen <sowen@cloudera.com> wrote:

> On Thu, Jun 26, 2014 at 9:15 AM, Aureliano Buendia <buendia360@gmail.com>
> wrote:
> > Summingbird is for map/reduce. Dataflow is the third generation of
> google's
> > map/reduce, and it generalizes map/reduce the way Spark does. See more
> about
> > this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s
>
> Yes, my point was that Summingbird is similar in that it is a
> higher-level service for batch/streaming computation, not that it is
> similar for being MapReduce-based.
>
> > It seems Dataflow is based on this paper:
> > http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf
>
> FlumeJava maps to Crunch in the Hadoop ecosystem. I think Dataflows is
> more than that but yeah that seems to be some of the 'language'. It is
> similar in that it is a distributed collection abstraction.
>



-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com

Mime
View raw message