spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: Does Spark use more memory than MapReduce?
Date Mon, 12 Oct 2015 18:34:45 GMT
Hi,

I think it depends of the storage level you use (MEMORY, DISK, or 
MEMORY_AND_DISK).

By default, micro-batching as designed in Spark requires more memory but 
much faster: when you use MapReduce, each map and reduce tasks have to 
use HDFS as backend of the data pipeline between the tasks. In Spark, 
disk flush is not always performed: it tries to keep data in memory as 
much as possible. So, it's balance to find between fast 
processing/micro-batching and memory consumption.
In some cases, using the disk is faster anyway (for instance, a 
MapReduce shuffle can be faster than a Spark shuffle, but you have an 
option to run a ShuffleMapReduceTask from Spark).

I'm speaking under cover of the experts ;)

Regards
JB

On 10/12/2015 06:52 PM, YaoPau wrote:
> I had this question come up and I'm not sure how to answer it.  A user said
> that, for a big job, he thought it would be better to use MapReduce since it
> writes to disk between iterations instead of keeping the data in memory the
> entire time like Spark generally does.
>
> I mentioned that Spark can cache to disk as well, but I'm not sure about the
> overarching question (which I realize is vague): for a typical job, would
> Spark use more memory than a MapReduce job?  Are there any memory usage
> inefficiencies from either?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-use-more-memory-than-MapReduce-tp25030.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message