spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: map - reduce only with disk
Date Tue, 02 Jun 2015 02:40:26 GMT
As long as you don't use cache(), these operations will go from disk to disk, and will only
use a fixed amount of memory to build some intermediate results. However, note that because
you're using groupByKey, that needs the values for each key to all fit in memory at once.
In this case, if you're going to reduce right after, you should use reduceByKey, which will
be more efficient.

Matei

> On Jun 1, 2015, at 2:21 PM, octavian.ganea <octavian.ganea@inf.ethz.ch> wrote:
> 
> Dear all,
> 
> Does anyone know how can I force Spark to use only the disk when doing a
> simple flatMap(..).groupByKey.reduce(_ + _) ? Thank you!
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/map-reduce-only-with-disk-tp23102.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message