spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jey Kottalam <...@cs.berkeley.edu>
Subject Re: Giraph Vs SPARK
Date Thu, 23 Jan 2014 22:14:05 GMT
Hi Suman,

Spark does indeed do in-memory computation, and does not require
spilling to disk after every map task. Could you explain where you
"see that intermediate map outputs gets written to disk"? Perhaps
you're seeing some intermediate results during a shuffle phase? In
that case, you may want to look into the
"spark.shuffle.consolidateFiles" option:
https://spark.incubator.apache.org/docs/0.8.1/configuration.html

-Jey

On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <suman.dna@gmail.com> wrote:
> Hi,
>
> I might be wrong, but need your help.
>
> My understanding in Giraph is that, it doesn't write the intermediate data
> to disk while sending messages to different machines. But in SPARK, I see
> that intermediate map outputs gets written to disk. Why does SPARK write
> intermediate data to disk ?
>
> What happens at reducer side ? Does SPARK write the data again to disk ? How
> does it differ from Hadoop MR ?
>
> Can't SPARK communicate everything in memory ?
>
> If my understanding is wrong. Please do correct me.
>
> Regards,
> Suman Bharadwaj S

Mime
View raw message