spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milos Nikolic <milos.nikoli...@gmail.com>
Subject Re: Worker hangs with 100% CPU in Standalone cluster
Date Thu, 16 Jan 2014 13:46:24 GMT
Hello,

I’m facing the same (or similar) problem. In my case, the last two tasks hang in a map function
following sc.sequenceFile(…). It happens from time to time (more often with TorrentBroadcast
than HttpBroadcast) and after restarting it works fine. 

The problem always happens on the same node — on the node that plays the roles of the master
and one worker. Once this node becomes master-only (i.e., I removed this nodes from conf/slaves),
the problem is gone. 

Does that mean that the master and workers have to be on separate nodes? 

Best,
Milos


On Jan 6, 2014, at 5:44 PM, Grega Kešpret <grega@celtra.com> wrote:

> Hi,
> 
> we are seeing several times a day one worker in a Standalone cluster hang up with 100%
CPU at the last task and doesn't proceed. After we restart the job, it completes successfully.
> 
> We are using Spark v0.8.1-incubating.
> 
> Attached please find jstack logs of Worker and CoarseGrainedExecutorBackend JVM processes.
> 
> Grega
> --
> <celtra_logo.png>	
> Grega Kešpret
> Analytics engineer
> 
> Celtra — Rich Media Mobile Advertising
> celtra.com | @celtramobile
> <logs.zip>


Mime
View raw message