spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Md. Rezaul Karim" <>
Subject Re: How to tune number of tesks
Date Thu, 26 Jan 2017 17:36:47 GMT

If you require all the partitioned to be saved with saveAsTextFile you can
use coalesce(1,true).saveAsTextFile(). This basically means do the
computation then coalesce to only 1 partition. You can also use
repartition(1) too which is just a wrapper for the coalesce that sets the
shuffle argument as TRUE.

Val yourRDD = ....

Hope that helps.

*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland

On 26 January 2017 at 16:21, Soheila S. <> wrote:

> Hi all,
> Please tell me how can I tune output partition numbers.
> I run my spark job on my local machine with 8 cores and input data is
> 6.5GB. It creates 193 tasks and put the output into 193 partitions.
> How can I change the number of tasks and consequently, the number of
> output files?
> Best,
> Soheila

View raw message