spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: What does "Spark is not just MapReduce" mean? Isn't every Spark job a form of MapReduce?
Date Sun, 28 Jun 2015 21:23:29 GMT
Vanilla map/reduce does not expose it: but hive on top of map/reduce has
superior partitioning (and bucketing) support to Spark.

2015-06-28 13:44 GMT-07:00 Koert Kuipers <koert@tresata.com>:

> spark is partitioner aware, so it can exploit a situation where 2 datasets
> are partitioned the same way (for example by doing a map-side join on
> them). map-red does not expose this.
>
> On Sun, Jun 28, 2015 at 12:13 PM, YaoPau <jonrgregg@gmail.com> wrote:
>
>> I've heard "Spark is not just MapReduce" mentioned during Spark talks,
>> but it
>> seems like every method that Spark has is really doing something like (Map
>> -> Reduce) or (Map -> Map -> Map -> Reduce) etc behind the scenes, with
>> the
>> performance benefit of keeping RDDs in memory between stages.
>>
>> Am I wrong about that?  Is Spark doing anything more efficiently than a
>> series of Maps followed by a Reduce in memory?  What methods does Spark
>> have
>> that can't easily be mapped (with somewhat similar efficiency) to Map and
>> Reduce in memory?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/What-does-Spark-is-not-just-MapReduce-mean-Isn-t-every-Spark-job-a-form-of-MapReduce-tp23518.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message