spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Spark - Partitions
Date Fri, 13 Oct 2017 04:07:20 GMT
Use repartition
On 13-Oct-2017 9:35 AM, "KhajaAsmath Mohammed" <mdkhajaasmath@gmail.com>
wrote:

> Hi,
>
> I am reading hive query and wiriting the data back into hive after doing
> some transformations.
>
> I have changed setting spark.sql.shuffle.partitions to 2000 and since then
> job completes fast but the main problem is I am getting 2000 files for each
> partition
> size of file is 10 MB .
>
> is there a way to get same performance but write lesser number of files ?
>
> I am trying repartition now but would like to know if there are any other
> options.
>
> Thanks,
> Asmath
>

Mime
View raw message