spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?
Date Wed, 31 Dec 2014 18:21:40 GMT
-dev, +user

A decent guess: Does your 'save' function entail collecting data back
to the driver? and are you running this from a machine that's not in
your Spark cluster? Then in client mode you're shipping data back to a
less-nearby machine, compared to with cluster mode. That could explain
the bottleneck.

On Wed, Dec 31, 2014 at 4:12 PM, Enno Shioji <> wrote:
> Hi,
> I have a very, very simple streaming job. When I deploy this on the exact
> same cluster, with the exact same parameters, I see big (40%) performance
> difference between "client" and "cluster" deployment mode. This seems a bit
> surprising.. Is this expected?
> The streaming job is:
>     val msgStream = kafkaStream
>       .map { case (k, v) => v}
>       .map(DatatypeConverter.printBase64Binary)
>       .foreachRDD(save)
>       .saveAsTextFile("s3n://some.bucket/path", classOf[LzoCodec])
> I tried several times, but the job deployed with "client" mode can only
> write at 60% throughput of the job deployed with "cluster" mode and this
> happens consistently. I'm logging at INFO level, but my application code
> doesn't log anything so it's only Spark logs. The logs I see in "client"
> mode doesn't seem like a crazy amount.
> The setup is:
> spark-ec2 [...] \
>   --copy-aws-credentials \
>   --instance-type=m3.2xlarge \
>   -s 2 launch test_cluster
> And all the deployment was done from the master machine.
> ᐧ

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message