spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Holway <andrew.hol...@otternetworks.de>
Subject Re: Save a spark RDD to disk
Date Tue, 08 Nov 2016 22:14:48 GMT
Thats around 750MB/s which seems quite respectable even in this day and age!

How many and what kind of disks to you have attached to your nodes? What
are you expecting?

On Tue, Nov 8, 2016 at 11:08 PM, Elf Of Lothlorein <redarrowg2@gmail.com>
wrote:

> Hi
> I am trying to save a RDD to disk and I am using the
> saveAsNewAPIHadoopFile for that. I am seeing that it takes almost 20 mins
> for about 900 GB of data. Is there any parameter that I can tune to make
> this saving faster.
> I am running about 45 executors with 5 cores each on 5 Spark worker nodes
> and using Spark on YARN for this..
> Thanks for your help.
> C
>



-- 
Otter Networks UG
http://otternetworks.de
Gotenstra├če 17
10829 Berlin

Mime
View raw message