spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Md. Rezaul Karim" <rezaul.ka...@insight-centre.org>
Subject Re: How to tune number of tesks
Date Thu, 26 Jan 2017 17:36:47 GMT
Hi,

If you require all the partitioned to be saved with saveAsTextFile you can
use coalesce(1,true).saveAsTextFile(). This basically means do the
computation then coalesce to only 1 partition. You can also use
repartition(1) too which is just a wrapper for the coalesce that sets the
shuffle argument as TRUE.

Val yourRDD = ....
yourRDD.coalesce(1).saveAsTextFile("data/output")


Hope that helps.



Regards,
_________________________________
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

On 26 January 2017 at 16:21, Soheila S. <soheila518@gmail.com> wrote:

> Hi all,
>
> Please tell me how can I tune output partition numbers.
> I run my spark job on my local machine with 8 cores and input data is
> 6.5GB. It creates 193 tasks and put the output into 193 partitions.
> How can I change the number of tasks and consequently, the number of
> output files?
>
> Best,
> Soheila
>

Mime
View raw message