spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Worker hangs with 100% CPU in Standalone cluster
Date Thu, 16 Jan 2014 20:08:17 GMT
Thanks for following up and explaining this one! Definitely something other
users might run into...


On Thu, Jan 16, 2014 at 5:58 AM, Grega Kešpret <grega@celtra.com> wrote:

> Just to follow up, we have since pinpointed the problem to be in
> application code (not Spark). In some cases, there was an infinite loop in
> Scala HashTable linear probing algorithm, where an element's next() pointed
> at itself. It was probably caused by wrong hashCode() and equals() methods
> on the object we were storing.
>
> Milos, we also have Master node separate from Worker nodes. Could someone
> from Spark team comment about that?
>
> Grega
> --
> [image: Inline image 1]
> *Grega Kešpret*
> Analytics engineer
>
> Celtra — Rich Media Mobile Advertising
> celtra.com <http://www.celtra.com/> | @celtramobile<http://www.twitter.com/celtramobile>
>
>
> On Thu, Jan 16, 2014 at 2:46 PM, Milos Nikolic <milos.nikolic83@gmail.com>wrote:
>
>> Hello,
>>
>> I’m facing the same (or similar) problem. In my case, the last two tasks
>> hang in a map function following sc.sequenceFile(…). It happens from time
>> to time (more often with TorrentBroadcast than HttpBroadcast) and after
>> restarting it works fine.
>>
>> The problem always happens on the same node — on the node that plays the
>> roles of the master and one worker. Once this node becomes master-only
>> (i.e., I removed this nodes from conf/slaves), the problem is gone.
>>
>> Does that mean that the master and workers have to be on separate nodes?
>>
>> Best,
>> Milos
>>
>>
>> On Jan 6, 2014, at 5:44 PM, Grega Kešpret <grega@celtra.com> wrote:
>>
>> Hi,
>>
>> we are seeing several times a day one worker in a Standalone cluster hang
>> up with 100% CPU at the last task and doesn't proceed. After we restart the
>> job, it completes successfully.
>>
>> We are using Spark v0.8.1-incubating.
>>
>> Attached please find jstack logs of Worker
>> and CoarseGrainedExecutorBackend JVM processes.
>>
>> Grega
>> --
>> <celtra_logo.png>
>> *Grega Kešpret*
>> Analytics engineer
>>
>> Celtra — Rich Media Mobile Advertising
>> celtra.com <http://www.celtra.com/> | @celtramobile<http://www.twitter.com/celtramobile>
>>  <logs.zip>
>>
>>
>>
>

Mime
View raw message