spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 颜发才(Yan Facai) <facai....@gmail.com>
Subject Re: How to improve performance of saveAsTextFile()
Date Sat, 11 Mar 2017 11:53:02 GMT
How about increasing RDD's partitions / rebalancing data?

On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud <mparsian@illumina.com>
wrote:

> How to improve performance of JavaRDD<String>.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
> JavaRDD<String> has 120 million entries.
>
> Thank you,
> Best regards,
> Mahmoud
>

Mime
View raw message