spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Jackson <ajack...@pobox.com>
Subject Heavy Stage Concentration - Ends With Failure
Date Wed, 20 Jul 2016 00:16:47 GMT
Hi,

I have a cluster with 15 nodes of which 5 are HDFS nodes.  I kick off a job
that creates some 120 stages.  Eventually, the active and pending stages
reduce down to a small bottleneck and it never fails... the tasks
associated with the 10 (or so) running tasks are always allocated to the
same executor on the same host.

Sooner or later, it runs out of memory ... or some other resource.  It
falls over and then they tasks are reallocated to another executor.

Why do we see such heavy concentration of tasks onto a single executor when
other executors are free?  Were the tasks assigned to an executor when the
job was decomposed into stages?

Mime
View raw message