spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <>
Subject Re: Save a spark RDD to disk
Date Wed, 09 Nov 2016 17:29:51 GMT
Can you increase the number of partitions and also increase the number of executors?
(This should improve the parallelization but you may become disk i/o bound)

On Nov 8, 2016, at 4:08 PM, Elf Of Lothlorein <<>>

I am trying to save a RDD to disk and I am using the saveAsNewAPIHadoopFile for that. I am
seeing that it takes almost 20 mins for about 900 GB of data. Is there any parameter that
I can tune to make this saving faster.
I am running about 45 executors with 5 cores each on 5 Spark worker nodes and using Spark
on YARN for this..
Thanks for your help.

View raw message