spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From German Schiavon <gschiavonsp...@gmail.com>
Subject Re: Thread spilling sort issue with single task
Date Tue, 26 Jan 2021 10:27:54 GMT
Hi,

One word : SKEW

It seems the classic skew problem, you would have to apply skew techniques
to repartition your data properly or if you are in spark 3.0+ try the
skewJoin optimization.

On Tue, 26 Jan 2021 at 11:20, rajat kumar <kumar.rajat20del@gmail.com>
wrote:

> Hi Everyone,
>
> I am running a spark application where I have applied 2 left joins. 1st
> join in Broadcast and another one is normal.
> Out of 200 tasks , last 1 task is stuck . It is running at "ANY" Locality
> level. It seems data skewness issue.
> It is doing too much spill and shuffle write is too much. Following error
> is coming in executor logs:
>
> INFO UnsafeExternalSorter: Thread spilling sort data of 10.4 GB to disk
> (10  times so far)
>
>
> Can anyone please suggest what can be wrong?
>
> Thanks
> Rajat
>

Mime
View raw message