spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Setting spark.akka.frameSize
Date Sat, 24 May 2014 01:19:30 GMT
Hi Matt,

First of all, in Spark 1.0 there's logging when the message exceeds the
frame size, so you won't have silent hangs in this scenario anymore.  See
https://issues.apache.org/jira/browse/SPARK-1244 and
https://github.com/apache/spark/pull/147/files for the details.

As far as the proper way to set spark.akka.frameSize for a standalone
cluster, I always thought the normal way of setting it was as documented at
http://spark.apache.org/docs/latest/configuration.html i.e. setting it on
the SparkConf object before you instantiate the SparkContext.  Shouldn't be
any further propagation necessary on the workers as the
CoarseGrainedExecutors they start up are seeded with the configuration for
that context on initialization.

You can also check the Executors tab on your Application's webui (:4040) to
see if the configuration item is picked up:

[image: Inline image 1]

Are you still observing stability issues with the job even with those
settings?

Cheers!
Andrew



On Fri, May 23, 2014 at 6:08 PM, MattSills <matthew.sills@gmail.com> wrote:

> Hi all,
>
> Configuration: Standalone 0.9.1-cdh4 cluster, 7 workers per node, 32gb per
> worker
>
> I'm running a job on a spark cluster, and running into some strange
> behavior. After a while, the akka frame sizes exceed 10mb, and then the
> whole job seizes up. I set "spark.akka.frameSize" to 128 in the SparkConf
> used to create the spark context (and also set it as a Java system property
> on the driver, for good measure). After this, the program didn't hang, but
> immediately failed, and logged an error message like the following:
>   (on the master):
>     14/05/20 21:49:50 INFO SparkDeploySchedulerBackend: Executor 1
> disconnected, so removing it
>     14/05/20 21:49:50 ERROR TaskSchedulerImpl: Lost executor 1 on [...]:
> remote Akka client disassociated
>   (on the workers):
>     14/05/20 21:50:25 WARN SparkDeploySchedulerBackend: Disconnected from
> Spark cluster! Waiting for reconnection...
>     14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Shutting down all
> executors
>     14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Asking each
> executor
> to shut down
>     14/05/20 21:50:25 INFO AppClient: Stop request to Master timed out; it
> may already be shut down.
>
> After lots of fumbling around, I ended up adding
> "-Dspark.akka.frameSize=128" to SPARK_JAVA_OPTS in spark-env.sh, under the
> theory that the workers couldn't read the larger akka messages. This
> /seems/
> to have made things work, but I'm still a little scared. Is this the
> standard way to set the max akka framesize, or is there a way to set it
> from
> the driver and have it propagate to the workers?
>
> Thanks,
> Matt
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Setting-spark-akka-frameSize-tp6337.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message