spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Petrino <christopher.petr...@gmail.com>
Subject Re: Job hangs in blocked task in final parquet write stage
Date Thu, 29 Nov 2018 15:05:58 GMT
If not, try running a coalesce. Your data may have grown and is defaulting
to a number of partitions that causing unnecessary overhead

On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <conrad@parsely.com> wrote:

> Thanks, I'll try using 5.17.0.
>
> For anyone trying to debug this problem in the future: In other jobs that
> hang in the same manner, the thread dump didn't have any blocked threads,
> so that might be a red herring.
>
> On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <
> christopher.petrino@gmail.com> wrote:
>
>> I ran into problems using 5.19 so I referred to 5.17 and it resolved my
>> issues.
>>
>> On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <conrad@parsely.com> wrote:
>>
>>> Hello Vadim,
>>>
>>> Interesting.  I've only been running this job at scale for a couple
>>> weeks so I can't say whether this is related to recent EMR changes.
>>>
>>> Much of the EMR-specific code for spark has to do with writing files to
>>> s3.  In this case I'm writing files to the cluster's HDFS though so my
>>> sense is that this is a spark issue, not an EMR (but I'm not sure).
>>>
>>> Conrad
>>>
>>> On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <vadim@datadoghq.com>
>>> wrote:
>>>
>>>> Hey Conrad,
>>>>
>>>> has it started happening recently?
>>>>
>>>> We recently started having some sporadic problems with drivers on EMR
>>>> when it gets stuck, up until two weeks ago everything was fine.
>>>> We're trying to figure out with the EMR team where the issue is coming
>>>> from.
>>>> On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <conrad@parsely.com> wrote:
>>>> >
>>>> > Dear spark community,
>>>> >
>>>> > I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging
>>>> in the final stage--the job usually works, but I see this hanging behavior
>>>> in about one out of 50 runs.
>>>> >
>>>> > The second-to-last stage sorts the dataframe, and the final stage
>>>> writes the dataframe to HDFS.
>>>> >
>>>> > Here you can see the executor logs, which indicate that it has
>>>> finished processing the task.
>>>> >
>>>> > Here you can see the thread dump from the executor that's hanging.
>>>> Here's the text of the blocked thread.
>>>> >
>>>> > I tried to work around this problem by enabling speculation, but
>>>> speculative execution never takes place.  I don't know why.
>>>> >
>>>> > Can anyone here help me?
>>>> >
>>>> > Thanks,
>>>> > Conrad
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from my iPhone
>>>>
>>>

Mime
View raw message