spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Re: Very long pause/hang at end of execution
Date Wed, 16 Nov 2016 10:48:27 GMT
Also, how are you launching the application? Through spark submit or
creating spark content in your app?

Thanks,
Aniket

On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

> Thanks for sharing the thread dump. I had a look at them and couldn't find
> anything unusual. Is there anything in the logs (driver + executor) that
> suggests what's going on? Also, what does the spark job do and what is the
> version of spark and hadoop you are using?
>
> Thanks,
> Aniket
>
>
> On Wed, Nov 16, 2016 at 2:07 AM Michael Johnson <mjjohnson.geo@yahoo.com>
> wrote:
>
> The extremely long hand/pause has started happening again. I've been
> running on a small remote cluster, so I used the UI to grab thread dumps
> rather than doing it from the command line. There seems to be one executor
> still alive, along with the driver; I grabbed 4 thread dumps from each, a
> couple of seconds apart. I'd greatly appreciate any help tracking down
> what's going on! (I've attached them, but I can paste them somewhere if
> that's more convenient.)
>
> Thanks,
> Michael
>
>
>
>
> On Sunday, November 6, 2016 10:49 PM, Michael Johnson
> <mjjohnson.geo@yahoo.com.INVALID> wrote:
>
>
> Hm. Something must have changed, as it was happening quite consistently
> and now I can't get it to reproduce. Thank you for the offer, and if it
> happens again I will try grabbing thread dumps and I will see if I can
> figure out what is going on.
>
>
> On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
>
> I doubt it's GC as you mentioned that the pause is several minutes. Since
> it's reproducible in local mode, can you run the spark application locally
> and once your job is complete (and application appears paused), can you
> take 5 thread dumps (using jstack or jcmd on the local spark JVM process)
> with 1 second delay between each dump and attach them? I can take a look.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson <mjjohnson.geo@yahoo.com>
> wrote:
>
> Thanks; I tried looking at the thread dumps for the driver and the one
> executor that had that option in the UI, but I'm afraid I don't know how to
> interpret what I saw...  I don't think it could be my code directly, since
> at this point my code has all completed? Could GC be taking that long?
>
> (I could also try grabbing the thread dumps and pasting them here, if that
> would help?)
>
> On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
> <mjjohnson.geo@yahoo.com.invalid> wrote:
>
> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>
>
>
>
>
>
>
>

Mime
View raw message