spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Chang <>
Subject Monitoring spark dis-associated workers
Date Wed, 11 Jun 2014 01:15:59 GMT
We're running into an issue where periodically the master loses connectivity
with workers in the spark cluster. We believe this issue tends to manifest
when the cluster is under heavy load, but we're not entirely sure when it
happens. I've seen one or two other messages to this list about this issue,
but no one seems to have a clue as to the actual bug.

So, to work around the issue, we'd like to programmatically monitor the
number of workers connected to the master and restart the cluster when the
master loses track of some of its workers. Any ideas on how to
programmatically write such a health check?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message