Thanks, I'll try using 5.17.0.

For anyone trying to debug this problem in the future: In other jobs that hang in the same manner, the thread dump didn't have any blocked threads, so that might be a red herring.

On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <christopher.petrino@gmail.com> wrote:
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. 

On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <conrad@parsely.com> wrote:
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <vadim@datadoghq.com> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <conrad@parsely.com> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone