spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashic Mahtab <as...@live.com>
Subject RE: What does "Spark is not just MapReduce" mean? Isn't every Spark job a form of MapReduce?
Date Sun, 28 Jun 2015 16:51:04 GMT
Spark comes with quite a few components. At it's core is..surprise....spark core. This provides
the core things required to run spark jobs. Spark provides a lot of operators out of the box...take
a look at https://spark.apache.org/docs/latest/programming-guide.html#transformationshttps://spark.apache.org/docs/latest/programming-guide.html#actions
While all of them can be implemented with variations of rd.map().reduce(), there are optimisations
to be gained in terms of data locality, etc., and the additional operators simply make life
simpler.
In addition to the core stuff, spark also brings things like Spark Streaming, Spark Sql and
data frames, MLLib, GraphX, etc. Spark Streaming gives you microbatches of rdds at periodic
intervals.Think "give me the last 15 seconds of events every 5 seconds". You can then program
towards the small collection, and the job will run in a fault tolerant manner on your cluster.
Spark Sql provides hive like functionality that works nicely with various data sources, and
RDDs. MLLib provide a lot of oob machine learning algorithms, and the new Spark ML project
provides a nice elegant pipeline api to take care of a lot of common machine learning tasks.
GraphX allows you to represent data in graphs, and run graph algorithms on it. e.g. you can
represent your data as RDDs of vertexes and edges, and run pagerank on a distributed cluster.
And there's more....so, yeah...Spark is definitely "not just" MapReduce. :)

> Date: Sun, 28 Jun 2015 09:13:18 -0700
> From: jonrgregg@gmail.com
> To: user@spark.apache.org
> Subject: What does "Spark is not just MapReduce" mean?  Isn't every Spark job a form
of MapReduce?
> 
> I've heard "Spark is not just MapReduce" mentioned during Spark talks, but it
> seems like every method that Spark has is really doing something like (Map
> -> Reduce) or (Map -> Map -> Map -> Reduce) etc behind the scenes, with the
> performance benefit of keeping RDDs in memory between stages.
> 
> Am I wrong about that?  Is Spark doing anything more efficiently than a
> series of Maps followed by a Reduce in memory?  What methods does Spark have
> that can't easily be mapped (with somewhat similar efficiency) to Map and
> Reduce in memory?
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-does-Spark-is-not-just-MapReduce-mean-Isn-t-every-Spark-job-a-form-of-MapReduce-tp23518.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
 		 	   		  
Mime
View raw message