spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: compare/contrast Spark with Cascading
Date Mon, 28 Oct 2013 18:08:49 GMT
Hi Philip,

Indeed, Spark's API allows direct creation of complex workflows the same way Cascading would.
Cascading built that functionality on top of MapReduce (translating user operations down to
a series of MapReduce jobs), but Spark's engine supports complex workflows from the start
and the API goes directly to those. So they are indeed alternatives in this way. Of course,
you can also mix both in a deployment because they can share data through HDFS.

There may be other differences as well -- for example, Cascading has a specific data model
for interchange between the operators (each record has to be a tuple), while Spark works directly
on Java objects, and Spark also has Python and Scala APIs.

Matei

On Oct 28, 2013, at 10:11 AM, Philip Ogren <philip.ogren@oracle.com> wrote:

> 
> My team is investigating a number of technologies in the Big Data space.  A team member
recently got turned on to Cascading as an application layer for orchestrating complex workflows/scenarios.
 He asked me if Spark had an "application layer"?  My initial reaction is "no" that Spark
would not have a separate orchestration/application layer.  Instead, the core Spark API (along
with Streaming) would compete directly with Cascading for this kind of functionality and that
the two would not likely be all that complementary.  I realize that I am exposing my ignorance
here and could be way off.  Is there anyone who knows a bit about both of these technologies
who could speak to this in broad strokes?  
> 
> Thanks!
> Philip
> 


Mime
View raw message