spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré>
Subject Re: Does Spark use more memory than MapReduce?
Date Mon, 12 Oct 2015 18:34:45 GMT

I think it depends of the storage level you use (MEMORY, DISK, or 

By default, micro-batching as designed in Spark requires more memory but 
much faster: when you use MapReduce, each map and reduce tasks have to 
use HDFS as backend of the data pipeline between the tasks. In Spark, 
disk flush is not always performed: it tries to keep data in memory as 
much as possible. So, it's balance to find between fast 
processing/micro-batching and memory consumption.
In some cases, using the disk is faster anyway (for instance, a 
MapReduce shuffle can be faster than a Spark shuffle, but you have an 
option to run a ShuffleMapReduceTask from Spark).

I'm speaking under cover of the experts ;)


On 10/12/2015 06:52 PM, YaoPau wrote:
> I had this question come up and I'm not sure how to answer it.  A user said
> that, for a big job, he thought it would be better to use MapReduce since it
> writes to disk between iterations instead of keeping the data in memory the
> entire time like Spark generally does.
> I mentioned that Spark can cache to disk as well, but I'm not sure about the
> overarching question (which I realize is vague): for a typical job, would
> Spark use more memory than a MapReduce job?  Are there any memory usage
> inefficiencies from either?
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Jean-Baptiste Onofré
Talend -

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message