spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milos Nikolic <>
Subject Re: Worker hangs with 100% CPU in Standalone cluster
Date Thu, 16 Jan 2014 13:46:24 GMT

I’m facing the same (or similar) problem. In my case, the last two tasks hang in a map function
following sc.sequenceFile(…). It happens from time to time (more often with TorrentBroadcast
than HttpBroadcast) and after restarting it works fine. 

The problem always happens on the same node — on the node that plays the roles of the master
and one worker. Once this node becomes master-only (i.e., I removed this nodes from conf/slaves),
the problem is gone. 

Does that mean that the master and workers have to be on separate nodes? 


On Jan 6, 2014, at 5:44 PM, Grega Kešpret <> wrote:

> Hi,
> we are seeing several times a day one worker in a Standalone cluster hang up with 100%
CPU at the last task and doesn't proceed. After we restart the job, it completes successfully.
> We are using Spark v0.8.1-incubating.
> Attached please find jstack logs of Worker and CoarseGrainedExecutorBackend JVM processes.
> Grega
> --
> <celtra_logo.png>	
> Grega Kešpret
> Analytics engineer
> Celtra — Rich Media Mobile Advertising
> | @celtramobile
> <>

View raw message