spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Troubleshooting and how to interpret the logs
Date Thu, 03 Oct 2013 21:00:26 GMT
Hi Ashish,

Those "removing" messages mean that the node in question didn't communicate with your application
for 45 seconds. Most likely the executor process on the node died, though there's also a chance
that it was doing a super-long garbage collection or that there was a network problem. Look
at the logs on the node to see if it crashed due to an exception or the OS killed it for some
reason. For garbage collection debugging, you can turn on -XX:+PrintGCDetails in SPARK_JAVA_OPTS
to log the lengths of GC pauses.

Matei

On Oct 3, 2013, at 1:10 PM, Ashish Rangole <arangole@gmail.com> wrote:

> Hi,
> 
> Trying to figure out what does it mean when the application (driver program) logs end
with the the lines like the ones below. This is with the application running on Spark 0.8.0
on EC2.
> 
> Any help will be greatly appreciated.
> 
> Thanks!
> 
> 
> 13/10/03 16:17:33 INFO cluster.ClusterTaskSetManager: Finished TID 1744 in 1183507 ms
on ip-10-232-80-206.ec2.internal (progress: 46/60)
> 13/10/03 16:17:33 INFO scheduler.DAGScheduler: Completed ResultTask(4, 20)
> 13/10/03 16:17:57 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(1,
ip-10-170-16-83.ec2.internal, 46907, 0) with no recent
> heart beats: 68685ms exceeds 45000ms
> 13/10/03 16:18:57 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(0,
ip-10-232-27-176.ec2.internal, 55654, 0) with no recent
>  heart beats: 95376ms exceeds 45000ms
> 13/10/03 16:19:13 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering
block manager ip-10-232-27-176.ec2.internal:55654 with 19.3 GB RAM
> 13/10/03 16:21:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Added rdd_7_293
in memory on ip-10-232-27-176.ec2.internal:55654 (size: 1975.7
>  KB, free: 19.2 GB)
> 13/10/03 16:21:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Added rdd_7_293
in memory on ip-10-232-27-176.ec2.internal:55654 (size: 1975.7
>  KB, free: 19.2 GB)
> 13/10/03 16:22:57 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(0,
ip-10-232-27-176.ec2.internal, 55654, 0) with no recent
>  heart beats: 78761ms exceeds 45000ms
> 


Mime
View raw message